posted on 2025-11-04, 14:27authored byJinxiang Ding
<p dir="ltr">Object detection in remote sensing images leveraging deep learning has become critically important for tasks including land-use analysis, military surveillance, agricultural management, and geoscientific research. Nevertheless, due to the inherent properties of remote sensing platforms—complex scenes, cluttered backgrounds, multi-scale targets, and densely distributed small objects—achieving accurate and efficient detection remains challenging.To address the semantic gap, information attenuation, and feature aliasing problems caused by downsampling and continuous dimensionality reduction in traditional feature pyramid structures, this paper proposes a Concurrent Two-branch Contextual Feature Fusion Network (CTNet) that significantly enhances multi-scale object detection performance through three key innovations.First, CTNet introduces a Concurrent Two-branch Contextual Attention mechanism (CTBC), which combines Multi-Scale Channel Attention (MSCA) with Spatial Squeeze Attention (SSS) to adaptively allocate fusion weights between low-level details and high-level semantics, achieving context-aware precise feature aggregation.Second, a Dual-Core Feature Optimization Convolution (DCFOConv) is designed to adaptively aggregate contextual information from different receptive fields, effectively enhancing multi-scale feature representations and improving robustness to scale variations. Finally, a Feature Enhancement and Selective Fusion (FESF) module is proposed in the high-level fusion stage to intelligently emphasize key positional information, compensate for detail loss during downsampling , and reinforce semantic retention and localization accuracy in complex backgrounds.Extensive experiments demonstrate that CTNet achieves outstanding detection accuracy with extremely low computational cost. For example, on the RSOD dataset, CTNet achieves 67.4% mAP@0.5:0.95 and 93.5% mAP@0.5 with only 2.34M parameters and 5.8 GFLOPs, outperforming all comparison models. Furthermore, CTNet delivers outstanding performance on the DIOR and LEVIR-CD datasets, attaining mAP@0.5 reaching 79.8% and 94.6%, respectively, reflecting a highly favorable trade-off between detection precision and computational efficiency. These results confirm that CTNet maintains exceptional lightweight efficiency while achieving state-of-the-art accuracy across diverse remote sensing datasets, demonstrating strong generalization and broad deployment potential for real-world applications.</p>