Abstract:Rail surface defect detection is essential for ensuring the safe operation of trains. However, the significant multi-scale variations among different defect types pose substantial challenges to achieving high-precision detection. Deep learning-based approaches have emerged as effective solutions for real-time, end-to-end defect detection.To address these challenges, this paper proposes a novel rail surface defect detection network, referred to as the CenterNet-EGS model. First, to suppress redundant background information and enhance the discriminative capability across various defect types, an Efficient Channel Attention (ECA) module is embedded into the CenterNet backbone. The original Bottleneck blocks are replaced with ECABottleneck modules, which improve the model’s ability to extract features under class-imbalanced conditions.Second, to achieve lightweight multi-scale feature fusion in the decoding stage, a Grouped Spatial Convolution (GSConv) module is incorporated. This enhances both semantic and spatial representation without significantly increasing computational cost, thereby providing more refined features for downstream object localization and regression tasks.Third, a Scale-Invariant Intersection over Union (SIoU) loss function is introduced to enhance the model’s robustness in predicting low-quality samples and to accelerate convergence during training.The proposed CenterNet-EGS model is evaluated on a dedicated rail surface defect dataset and benchmarked against several mainstream detection models, including Faster R-CNN, YOLOv8_x, RetinaNet, SSD, and the original CenterNet. Experimental results demonstrate that CenterNet-EGS achieves superior overall performance, attaining a mean Average Precision (mAP) of 91.99%, precision of 83.29%, F1 score of 0.847, recall of 88.39%, and an inference speed of 56.20 FPS, thereby fully meeting the requirements for real-time detection in practical applications.