LW-IRSTNet: Lightweight Infrared Small Target Segmentation Network and Application Deployment

Efficiently and accurately separating infrared (IR) small targets from complex backgrounds presents a significant challenge. Numerous studies in the literature have proposed various feature fusion modules designed specifically to enhance the extraction of IR small target features. While these designs offer some incremental improvement to the accuracy of IR small target detection, they come at a steep cost of significantly increasing network parameters and floating-point operations per second (FLOPs). Striving for a balance between computational efficiency and model accuracy, we decided to forgo these complex feature fusion modules. Instead, we developed a new lightweight encoding and decoding structure known as the lightweight IR small target segmentation network (LW-IRSTNet). This structure integrates regular convolutions, depthwise separable convolutions, atrous convolutions, and asymmetric convolutions modules. In addition, we devised postprocessing modules, including an eight-neighborhood clustering algorithm and an online target feature adjustment strategy. Experimental results indicate that: 1) the segmentation accuracy metrics of LW-IRSTNet match the best results of 14 state-of-the-art comparative baselines; 2) the parameters and FLOPs of LW-IRSTNet at only 0.16M and 303M, respectively, are significantly smaller in comparison to these baselines; and 3) the postprocessing modules enhance both user-friendliness and the robustness of algorithm deployment. Moreover, LW-IRSTNet has been successfully implemented on both embedded platforms and websites, expanding its range of applications. Utilizing the open neural network exchange (ONNX) framework, neural network processing unit (NPU) acceleration, and CPU multithreaded resource allocation, we have been able to achieve high-performance inference capabilities, as well as online dynamic threshold adjustment with the LW-IRSTNet. The source codes for this project can be accessed at https://github.com/kourenke/LW-IRSTNet.


I. INTRODUCTION
I NFRARED(IR) small target detection technology plays a vital role in early warning and reconnaissance tasks [1].Given the rigorous demands for precision and speed in remote target detection within these tasks, many researchers have invested notable efforts into refining IR small target detection algorithms over the years.
IR small target detection algorithms primarily fall into two categories: track before detection (TBD) and detect before tracking (DBT).Since TBD algorithms require the association of multiframe images and do not perform optimally in real-time scenarios, a significant portion of research has focused on the development of DBT algorithms, which rely on single-frame detection [2].Over the recent decades, single-frame IR small target detection has primarily utilized model-driven algorithms.These algorithms leverage prior knowledge (specifically, the physical and imaging characteristics of the target) to form reasonable assumptions.They typically encompass background estimation methods [3], [4], morphological methods [5], [6], local contrast methods [7], [8], [9], [10], directional derivative/gradient methods [11], [12], frequency-domain methods [13], [14], and low-rank sparse methods [15], [16], [17], among others.However, these model-driven algorithms require extensive parameter setting.Consequently, they can underperform when there are significant changes to the target's size, shape, signal-tonoise ratio, and background clutter, leading to lower detection performance in practical implementation.
With the development of deep learning, particularly since the release of the IR small target dataset by Wang et al. [18] in 2019, a plethora of algorithms for IR small target segmentation tasks have emerged [19].Dai et al. [20] proposed an asymmetric context modulation module (ACM) with either feature pyramid network (FPN) [21] or UNet [22] as the backbone and introduced the high-quality SIRST dataset.Zhang et al. [23] designed an attention-guided pyramid context network (AGPCNet) with residual network (ResNet) [24] as the backbone and expanded the SIRST dataset.Li et al. [25] devised a dense nested attention network (DNANet) with UNet as the backbone, which facilitated progressive interaction between high-and low-level features and published the NUDT-SIRST dataset.Huang et al. [26] utilized Visual Geometry Group (VGG) [27] as the backbone and designed a local similarity pyramid module (LSPM) to effectively capture multiscale features of IR small targets.Zuo et al. [28] developed a multiscale feature fusion pyramid module (AFFPN) 1558-0644 © 2023 IEEE.Personal use is permitted, but republication/redistribution requires IEEE permission.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.with ResNet as the backbone, aiming to address target loss in deep networks.Wang et al. [29] utilized ResNet as the backbone and introduced a method that involved using a region proposal network (RPN) to extract candidate targets, an fully convolutional network (FCN) [30] network to generate feature maps, and a transformer [31] to determine candidate targets.In summary, the algorithms discussed above follow a general pattern: 1) they employ classical networks such as VGG, UNet, and ResNet for encoding; 2) the multiscale feature fusion modules (MFFM) are designed on the high-level feature map; and 3) the channel and spatial attention mechanism fusion modules are designed on the same scale feature map corresponding to encoding and decoding.By reproducing the codes from the aforementioned literature, the experimental results indicate that adding MFFM and channel-spatial attention fusion modules can slightly enhance the detection accuracy.However, it also leads to a significant increase in network parameters and floating-point operations per second (FLOPs).In light of this, we propose a lightweight IR small target segmentation network (LW-IRSTNet) that can be deployed on embedded platforms.The contributions of our study are given as follows.
1) LW-IRSTNet achieves excellent segmentation results on multiple IR small target datasets without relying on feature fusion and context modules.It has only 0.16 M parameters and 303 M FLOPs, striking a balance between segmentation accuracy and speed.2) In LW-IRSTNet, we propose a bottleneck structure known as the depthwise separable-atrous-asymmetricatrous module (DAAA).This structure not only minimizes computational complexity but also effectively learns multiscale features of IR small targets.3) A postprocessing module is introduced.This not only enhances the algorithm's user-friendly and robustness in application deployment but also broadens its range of applications.It can meet the requirements of real-time, high-precision, and online dynamic target feature adjustment.We also deployed LW-IRSTNet on the embedded platform and website to realize engineering applications.

A. Design Strategy for IRSTNet
To effectively extract IR small target features, existing IRSTNets employ various design strategies.The following strategies have been summarized.1) Asymmetric Context Feature Fusion Strategy:Both ACM [20] and AGPCNet [23] utilize the top-down channel attention mechanism and the bottom-up spatial attention mechanism to extract high-level semantic information and low-level feature information, respectively.
2) Densely Nested Interactive Feature Fusion Strategy: DNANet [25] achieves progressive interaction between highand low-level features through dense nested interaction networks.By repeatedly fusing and enhancing contextual information, DNANet effectively combines and utilizes the information of small targets.
3) Multiscale Feature Fusion Strategy: Since IR small targets exhibit changes across different scales, LSPM [26] and AFFPN [28] construct multiscale feature maps on the high-level feature maps using atrous convolution and adaptive global average pooling.This enables the network to learn the characteristics of IR small targets at different scales.
4) Generative Adversarial Strategy: Zhao et al. [32] proposed a generative adversarial network that can predict IR small targets as a special type of noise based on the learned data distribution and hierarchical characteristics.This strategy leverages the generative power of the adversarial network to enhance target segmentation.
5) Based on Transformer + CNN Structure: Liu et al. [33] were the first to use the transformer architecture for IR small target segmentation to capture large-range dependencies.Wu et al. [34] introduced the multilevel TransUNet (MTUNet), which combines a hybrid vision transformer (ViT) encoder and convolutional neural networks (CNN) to extract multilevel features.
While the aforementioned designs aim to extract low-level feature information and high-level semantic information of IR small targets, they often come at the cost of increased computational complexity.Given that the features of IR small targets are not very distinctive, this study focuses on minimizing network computational complexity while ensuring accurate segmentation of IR small targets.

B. Lightweight Network Design Strategy
A complex network model typically achieves higher accuracy compared to a simpler one.However, due to its large storage requirements and consumption of computing resources, effectively applying this network to embedded platforms is challenging.As a result, there has been significant research focused on developing lightweight networks.Let us review the key design strategies for lightweight networks, which can be categorized into manually designed lightweight network structures, model compression, and automated neural network architectures.
2) Model Compression: Model compression techniques can be categorized into knowledge distillation, pruning, quantization, and low-rank decomposition.
1) Knowledge distillation [45], [46], [47], [48] involves creating a network of "student" models from larger "teacher" models, enabling the small model to benefit from the feature extraction capabilities of the larger model.
4) Low-rank decomposition [56], [57] focuses on decomposing weight matrices into low-rank matrices, reducing the computational complexity of the network.

3) Automated Neural Network Architectures:
The development of automated neural architecture search (NAS) has automated the network design process to some extent.NAS methods [58], [59], [60], [61] have achieved remarkable results in designing lightweight networks.These techniques search for the optimal network architecture from a predefined search space based on performance constraints.

III. NETWORK ARCHITECTURE OF LW-IRSTNET
In this section, we present the design of LW-IRSTNet and its postprocessing module based on the design principles of lightweight networks in previous research and practical engineering considerations.We begin by describing the overall structure of LW-IRSTNet and then provide the design details and motivations for each module within the network.Finally, we analyze the postprocessing module from an engineering application perspective.

A. Network Architecture
The architecture of LW-IRSTNet is shown in Fig. 1(a   We are pleased to report that the simplified network structure still achieves excellent segmentation results on IR small target datasets.The six stages of LW-IRSTNet include an initial downsampling module, downsampling modules, regular convolution modules, depthwise separable convolution (Dw-Conv.)modules, atrous convolution (At-Conv.)modules, asymmetric convolution (As-Conv.)modules, and upsampling modules, as shown in Fig. 1(d).For detailed network specifications, please refer to Section III-B.

B. Network Details
1) Initial Downsampling Module: Considering the high computational resource requirements of processing high-resolution input images, we adopt the design concept of Inception V2 [62] in the initial downsampling module to achieve parameter compression and accelerate inference time.Thus, the feature map of Stage 1 (F map1 ) can be represented as where the kernel size, stride, and channel number of the convolution branch (Conv2d) are 3, 2, and 5, respectively.The kernel size, stride, and channel number of MaxPooling branch are 3, 2, and 3, respectively.x is the input image (3 × 256 × 256).B is batch normalization.P is the activation function PReLu.2) Regular, Depthwise Separable, Atrous, and Asymmetric Convolution Modules: In Fig. 1(d), the structure of these four modules is similar, comprising a convolutional branch and a shortcut connection branch.The convolutional branch consists of three convolutional layers: the first 1 × 1 convolution is employed to reduce or expand channel dimensions, the main convolution (regular/depthwise separate/atrous) is used to extract target features, and the second 1 × 1 convolution is utilized to restore the original channel dimension.
To fully extract the shallow feature information from the target, we employ four regular convolution modules in Stage 2. To capture deep semantic information of multiscale IR small targets without increasing the parameter count, we incorporate four consecutive DAAA modules in Stage 3. The specific parameters are provided in Table I, representing the optimal hyperparameters determined through extensive ablation experiments.
3) Downsampling Module: The structure of the downsampling module is similar to that of a regular convolution module but with two key differences.First, in the convolutional branches, the first 1 × 1 convolution is replaced with a 2 × 2 convolution with a stride of 2. Second, in the shortcut connection branches, we initially reduce the feature map resolution by applying a 2 × 2 MaxPooling operation with a stride of 2. Subsequently, we zero-pad the activations to match the number of feature maps.Finally, the feature maps from the two branches are added and activated.
4) Upsampling Module: The ablation experiment shows that deconvolution is more helpful in improving segmentation accuracy compared to other upsampling methods (e.g., bilinear, bilinear + deconvolution).
5) Other Network Details: 1) Inspired by the design ideas of efficient neural network (ENet) [66] networks, we integrate the dropout strategy into the downsampling, regular, depthwise separable, atrous, and asymmetric convolution modules to further reduce parameters and mitigate overfitting.Specifically, we set the dropout rate to 0.01 in Stage 2 and 0.1 in Stages 3, 5, and 6.
2) Considering the shallow network depth, the activation function of PReLU is adopted in the encoding stage (Stages 2 and 3) and ReLU is adopted in the decoding stage (Stages 5 and 6).
3) After each convolutional layer, we perform batch normalization and activation.
4) To reduce the parameters and overall memory operations, we do not use bias terms throughout the entire network.

C. Postprocessing 1) Eight Neighborhood Clustering Algorithms:
The IR small target segmentation task belongs to a binary classification task with extremely imbalanced positive and negative samples.After inference and threshold segmentation by LW-IRSTNet, the segmentation result can be obtained, defined as where Output mask is the segmentation result (0 is the background and 255 is the target).Infer saliencymap is the saliency map.Th is the threshold (in the training stage, and learn with 0 as the threshold to obtain the network weight parameters.Therefore, the threshold is set to 0 in the inference stage).
Next, we perform eight connected clusterings [25] on the segmentation result to obtain the number of targets, centroids, aspect ratio, pixel size, and coordinates in the image field of view, providing data support for subsequent tracking tasks.
The eight connected clustering algorithms can be represented as N 8 (x i , y i )∩N 8 x j , y j ̸ = ∅ p (x i ,y i ) = p (x j ,y j ), ∀ p (x i ,y i ) , p (x j ,y j )∈Outputmask where p (x i ,y i ) and p (x j ,y j ) are any two pixels on the segmentation result (Output mask ).If they have intersecting regions in eight neighborhoods and have the same value (0 or 255), then they belong to the same category (target or background).
2) Online Target Feature Adjustment Strategy: Under complex backgrounds and similar noise interference, any algorithm has the possibility of false alarms.Since optimizing algorithms is very difficult, why do not we change our perspective and improve the robustness of the algorithm from an engineering perspective?
Following this viewpoint, we propose an online target feature threshold adjustment strategy, which is defined as: where T size is the pixel size of the target, m and k are the lower and upper threshold values of the target pixel size, respectively, T ratio is the aspect ratio of the target, and l and n are the lower and upper thresholds for the target aspect ratio, respectively.
This strategy is particularly practical in the engineering application of LW-IRSTNet.Note that the setting of threshold parameters needs to be dynamically adjusted based on prior knowledge or scene changes, which can further improve the segmentation accuracy of IR small targets, but it is not a necessary operation in a clean background.

IV. APPLICATION DEPLOYMENT OF LW-IRSTNET
From an academic standpoint, there has been a predominant focus on improving the accuracy of detection algorithms, often overlooking the importance of lightweight design and practical deployment.However, in military applications, the deployment of mobile devices is crucial for IR small target detection tasks.Therefore, Section III concentrated on designing LW-IRSTNet and proposing a postprocessing module suitable for engineering applications.This section now shifts the focus to the application deployment of LW-IRSTNet, with the overall research roadmap shown in Fig. 2.

A. Embedded Deployment
Within the academic community, the Pytorch framework is widely utilized for designing network models and conducting training and testing.To further optimize the inference speed of LW-IRSTNet and deploy it on embedded platforms, we convert the model (.pkl format) trained using the Pytorch framework into the open neural network exchange (ONNX) format.It is worth noting that major companies have also optimized their own inference frameworks for embedded devices, such as NVIDIA cuda NCNNs, Tencent neural network (TNN), Alibaba's mobile neural network (MNN), and NVIDIA's TensorRT.For the trained LW-IRSTNet network model, we have published four inference frameworks on Github: ONNX, NCNN, TNN, and MNN.

B. Design of the HCI System and Website Deployment
To display real-time information including the number of targets, centroid coordinates, aspect ratio, and pixel size, as well as dynamically adjust target feature thresholds (as a postprocessing module in Section III-C), we designed a human-computer interaction (HCI) system.This system can perform four tasks: offline single-frame segmentation, video segmentation, online real-time segmentation, and algorithmic performance evaluation.Considering the varying compatibility of algorithms with different embedded platforms, we have deployed this HCI system on the website, effectively resolving the compatibility problem, as shown in Fig. 3. Fig. 3(a) shows the algorithm evaluation system, which assesses various segmentation algorithms on different datasets based on metrics such as precision, recall, mean intersection over union (mIoU), F1, area under the curve (AUC), FLOPs, Params, and frames per second (FPS).Fig. 3(b) shows the offline single-frame segmentation system, capable of loading different single-frame IR images and segmenting small targets within them.Fig. 3(c) shows the offline video segmentation system, which can load different videos and perform small target segmentation within them.Fig. 3(d) shows the online IR small target real-time segmentation system, featuring the ability to dynamically adjust the inference threshold, target aspect ratio, and pixel size.It should be noted that this system also allows for the further deployment of the IR small target tracking algorithm [67].

A. Basic Parameters 1) Training Details:
a) Software and hardware configuration: The system employs Ubuntu 18.04 as the operating system, with 32 GB of memory.The CPU is Intel 1 Xeon 1 E5-2630 v3 at 2.40 GHz × 32, and the GPU is NVIDIA GeForce RTX 2080 Ti.
b) Training hyperparameters: The batch size is set to 64, and the epoch is 100.The optimizer used is stochastic gradient descent (SGD), with a momentum of 0.9 and weight decay of 1e −4 .The initial learning rate is 0.05, and the learning rate strategy is poly.The loss function is SoftLoULoss [23].
c) Datasets: Considering the lack of clear definitions for the shape and size of IR small targets within public datasets and the limited size of the data samples available, we combined four existing public datasets (MDFA [18], SIRST [20], SIRST Aug [23], and NUDT-SIRST [25]) for both training and testing purposes.This fusion allowed us to evaluate the robustness and multiscale detection capabilities of various algorithms.Also, we consider the black heat mode in IR imaging, so the background and target within the dataset MDFA are inverted in grayscale.

C. Evaluation Metrics
To compare the computational complexity and accuracy of 15 different algorithms, we use FLOPs, Params, FPS, mIoU, and F1 for analysis, as shown in Table II.

D. Comparative Experiment 1) Qualitative Comparative Analysis:
a) Computational complexity analysis: In Table II, LW-IRSTNet stands out with the lowest FLOPs and Params compared to the other 14 algorithms, making it suitable for deployment on embedded platforms.Furthermore, the inference speed of LW-IRSTNet is faster and ranks second among the other algorithms with an FPS of approximately 30.
To further enhance its speed, we adopted a two-step approach.First, we trained the model using the PyTorch framework and then converted it to the ONNX framework.Subsequently, we loaded the ONNX framework into OpenCV for inference.This optimization resulted in a substantial improvement in the inference speed of LW-IRSTNet, increasing it by 2.6 times from 30 to 79 FPS. 1 Registered trademark.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.(f) Regional target (drone) in the sky, clouds, ground, and jungle background.(g) Regional target (drone) in the clean sky background.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.b) Accuracy and robustness analysis: It is important to note that the primary aim of this study was to enable embedded deployment by minimizing computational complexity while ensuring segmentation accuracy.In Table II, although the LW-IRSTNet algorithm ranks second in mIoU and third in F1, the reduction in mIoU is only 1.32% and the reduction in F1 is merely 0.81% compared to the algorithm with the best accuracy, MTUNet.Meanwhile, the FLOPs and Params of LW-IRSTNet are significantly reduced by 95.12% and 98.01%, respectively, compared to MTUNet.
Furthermore, MTUNet, which achieves the highest accuracy, is based on the transformer framework, which is not particularly suitable for embedded deployments at this stage.
Similarly, although the top three AGPCNet algorithms in terms of accuracy utilize the CNN framework, their computational complexity is much higher than that of LW-IRSTNet, which is not favorable for embedded deployment.
c) Comprehensive analysis: To provide a visualization of the relationship between accuracy and computational complexity among the different algorithms, we have plotted a scatter diagram in Fig. 4. It is evident that our proposed LW-IRSTNet algorithm effectively balances the segmentation accuracy and computational complexity, fulfilling the initial objective of the study.
2) Qualitative Comparative Analysis: To visually compare the segmentation effects of different algorithms, we carefully Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
selected seven representative single-frame IR small target images with diverse scenes, target scales, and target types for testing, as shown in Fig. 5. Note: In Fig. 5, the red, green, and blue boxes represent correct target detection, missed detection, and false alarms, respectively.
In Fig. 5(a), a long-range point drone is almost submerged in the jungle background.Only the LW-IRSTNet and MTUNet algorithms successfully detect the target without any false alarms or missed detections.In Fig. 5(b), an aircraft is barely visible due to its weak local contrast and long distance.Only the LW-IRSTNet, MTUNet, and DNANet algorithms manage to detect it amidst the dense cumulus background.In contrast, the other 12 algorithms exhibit missed detections.Fig. 5(c) shows the detection of a return capsule about to land, extracted from a public video.While all 15 algorithms successfully detect the return capsule, six of them mistakenly identify the parachute as a target.However, the LW-IRSTNet algorithm adeptly detects the return capsule without any false alarms or missed detections.In Fig. 5(d) and (e), there are two small birds and two drones flying in the sky, respectively, with weak local contrasts.Among the 15 algorithms, only LW-IRSTNet, MTUNet, DNANet, ENet, and Fusionnet accurately detect multiple targets without false alarms and missed detections.In Fig. 5(f), although the drone edge contour is clearer and it has stronger local contrast, it is closer to the highlighted clouds, thus giving false alarms to the nine algorithms.Conversely, the LW-IRSTNet algorithm not only captures the drone's shape accurately but also avoids any false alarms or missed detections.Finally, in Fig. 5(g), we observe a large and well-defined drone against a relatively clean background.Although all 15 algorithms correctly segment the drone, the LW-IRSTNet algorithm exhibits superior accuracy in terms of pixel-level segmentation.
In summary, the LW-IRSTNet has good segmentation results for IR small targets at different scales with different complex backgrounds, while other algorithms all have different degrees of false alarms or missed detections.

E. Ablation Experiment 1) DAAA Modules:
To demonstrate the efficacy of the DAAA module, all convolution modes in the DAAA module are successively changed to regular, atrous, depthwise separable convolution or remove DAAA 3 and 4. The results of these experiments are summarized in Table III.The experimental results show that making any changes in the DAAA module will reduce the mIoU and F1.
2) Channel Expansion or Compression Ratio in the DAAA Modules: In this experiment, we systematically altered the expansion ratio of the depthwise separable convolution channel and the compression ratio of the atrous/asymmetric convolution channel within the DAAA module, as shown in Table IV.The experimental results show that the expansion or compression channel rate has a small impact on the results of mIoU and F1, but a large impact on FLOPs and Params.To strike a balance between performance and computational complexity, we ultimately selected the design scheme of (2,4).
3) Atrous Ratio Setting in the DAAA Modules: The atrous ratio is changed successively, as shown in Table V.We that changing the atrous rate did not affect the FLOPs and Params of LW-IRSTNet.However, the segmentation accuracy, as measured by mIoU and F1 scores, consistently yielded the highest results when the atrous ratio was set as (2,4,8,16).
4) Kernel Sizes of Asymmetric Convolution in the DAAA Modules: The kernel size of asymmetric convolution in the DAAA module is changed sequentially, as shown in Table VI.Notably, we found that setting the kernel size to 7 resulted in relatively higher mIoU and F1 scores compared to other kernel sizes.

5) Number of Channels in Different Stages:
The number of channels in different stages is changed in turn, as shown in Table VII.While the combination of (8,16,32) significantly reduced FLOPs and Params, we also observed a decline of 2%-3% points in mIoU and F1.Conversely, the combination of (16,64,128) substantially increased FLOPs and Params by approximately 3-4 times.Ultimately, we determined that a combination of (8,32,64) struck a balance between performance and computational complexity.
6) Upsampling Methods: In Table VIII, we discovered that using bilinear interpolation in the upsampling stage resulted in reduced model parameters and FLOPs.However, there was a tradeoff of 3%-4% loss in mIoU and F1 scores.Considering  this balance, we opted for deconvolution as the upsampling method for LW-IRSTNet.
7) Feature Fusion Methods: Given that existing IRSTNets typically employ complex feature fusion methods [20], [23], [26], [28], [63], [64], [65], we conducted ablation experiments to ascertain if incorporating these methods into LW-IRSTNet would improve segmentation accuracy.Table IX presents the results of these experiments, revealing that our designed LW-IRSTNet algorithm achieves the highest segmentation accuracy while maintaining smaller parameter size and FLOPs, even without relying on complex feature fusion methods.
In summary, the comprehensive comparative experiments and ablation analysis provide substantial evidence that the LW-IRSTNet algorithm successfully balances lightweight design considerations with high segmentation accuracy.

VI. DISCUSSION
Although the LW-IRSTNet algorithm demonstrates robustness in segmenting small multiscale IR targets in complex backgrounds, it may still encounter challenges in certain scenarios.For instance, if a drone is completely submerged in a wooded background, it becomes difficult to identify it, as shown in Fig. 6.However, the corresponding thermal map in Fig. 6(b) reveals that the drone's thermal value is significantly higher than the background, even though it does not exceed the inference threshold.Therefore, future research will focus  on addressing the segmentation of targets in cases involving background occlusion.

VII. CONCLUSION
In conclusion, considering the initial research objective, we aimed to design an IRSTNet that effectively reduces computational complexity while maintaining high segmentation accuracy.The experimental results confirm that LW-IRSTNet successfully meets these design requirements.Moving forward, our research will focus on two key areas based on LW-IRSTNet: 1) exploring methods to enhance the detection capability of IR small targets under complex ground jungle background occlusion and 2) investigating real-time IR small target tracking algorithms.
) and consists of six stages.To further reduce the computational complexity, we have omitted the MFFM [Fig.1(b1)-(b3)] and context feature fusion modules [CFFM, Fig. 1(c2)-(c7)], focusing instead on carefully designing the encoding and decoding structure.Only the sum connection [Fig.1(c1)] is used for feature fusion between encoding feature maps and decoding feature maps at the same scale.

Fig. 3 .
Fig. 3. Web-based deployment of HCI systems.(a) Offline single-frame segmentation system.(b) Offline video segmentation system.(c) Online real-time segmentation system, which can dynamically adjust target feature thresholds.(d) Algorithm performance evaluation system.

Fig. 5 .
Fig. 5. Segmentation results of 15 algorithms for IR small targets of different scales and types in complex backgrounds.(a) Point target (drone) in the complex ground background.(b) Point target (aircraft) in the complex dense cumulus background.(c) Point target (reentry capsule) in the background of the sky-ground line.(d) Two point targets (birds) in the sky background.(e) Two point targets (drones) in the sky, ground, and architectural background.(f)Regional target (drone) in the sky, clouds, ground, and jungle background.(g) Regional target (drone) in the clean sky background.

Fig. 5 .
Fig. 5. (Continued.)Segmentation results of 15 algorithms for IR small targets of different scales and types in complex backgrounds.(a) Point target (drone) in the complex ground background.(b) Point target (aircraft) in the complex dense cumulus background.(c) Point target (reentry capsule) in the background of the sky-ground line.(d) Two point targets (birds) in the sky background.(e) Two point targets (drones) in the sky, ground, and architectural background.(f)Regional target (drone) in the sky, clouds, ground, and jungle background.(g) Regional target (drone) in the clean sky background.

Fig. 6 .
Fig. 6.(a) Original image containing a small drone, which is submerged in the woods.(b) Heat map after inference by LW-IRSTNet algorithm.(c) Real-time segmentation results.

TABLE II COMPARATIVE
ANALYSIS OF SEGMENTATION ACCURACY OF DIFFERENT ALGORITHMS

TABLE III CONVOLUTION
METHODS OF DAAA MODULES

TABLE VI KERNEL
SIZES OF ASYMMETRIC CONVOLUTION TABLE VII NUMBER OF CHANNELS IN DIFFERENT STAGES