Abstract:To address the challenge of low detection accuracy in automated tomato harvesting and sorting, primarily caused by the color similarity between immature fruits and the background, difficulties in small target recognition, and occlusion by branches and leaves, an improved lightweight model named RRM-YOLO is proposed based on enhancements to YOLOv8n. First, to enhance the model"s ability to discriminate low-contrast features and reduce background interference, Receptive Field Attention Convolution is adopted to replace certain standard convolutions, enabling dynamic spatial weighting. Second, to strengthen small target feature extraction and fusion, a Reparameterized Convolution Based on Channel Shuffle and One-Shot Aggregation is introduced to replace the original C2f module. Finally, to alleviate missed detections caused by occlusion, a Multi-Head Self-Attention mechanism is integrated into the deeper network layers to model global contextual dependencies of occluded objects. Experimental results demonstrate that RRM-YOLO achieves a precision of 83.8% and an mAP@50:95 of 70.7% on the test set, significantly improving by 6.8% and 5.1%, respectively, compared to the baseline YOLOv8n, while maintaining an inference speed of 134.4 FPS. RRM-YOLO provides a high-precision, efficient, and easily deployable visual solution for tomato detection challenges in complex agricultural scenarios.