Abstract:Power safety equipment detection is essential for ensuring worker safety, reducing the risk of accidents, and minimizing economic losses. To address the challenges posed by complex backgrounds and imbalanced sample distributions in power operation scenarios, a dual-domain gate fusion detection method (Dual-Domain Gate Fusion DEtection TRansformer, D2GF-DETR) that integrates spatial and frequency domain features is proposed based on RT-DETR. Specifically, a Dual-Domain Feature Enhancement module was designed to mitigate interference caused by complex backgrounds, which often leads to the loss of fine details in conventional convolutional neural networks. In this module, spatial and frequency domain features are integrated, and the Fourier transform is employed to suppress background noise, thereby enhancing the model’s sensitivity to detailed and edge information. In addition, a Focused Fusion module was introduced, where depthwise separable convolutions are combined with gated convolutions to concentrate on key regional features, effectively reducing noise interference during multi-scale feature fusion. Furthermore, a Temporally Smoothed Slide Loss function was proposed to dynamically reweight samples, thereby improving the learning of hard examples and enhancing detection stability under temporal and dynamic variations. Experimental results demonstrate that, compared with the baseline RT-DETR, the proposed method achieved improvements of 3.1% and 2.4% in mAP50, and 2.8% and 1.8% in mAP50-95 on the insulated gloves and workwear datasets, respectively. The proposed D2GF-DETR yielded superior detection performance over existing mainstream methods while maintaining low computational overhead.