figshare
Browse
Kunhao_thesis_final.pdf (37.06 MB)

Exploring diverse strategies for image semantic segmentation in complex scenarios

Download (37.06 MB)
thesis
posted on 2024-01-25, 13:00 authored by Kunhao Yuan

Semantic segmentation is a well-established problem in computer vision that involves automatically assigning labels to individual pixels in images. Recent success in image recognition with convolutional neural networks has brought deep learning-based methods into the field of semantic segmentation. Since the emergence of the fully-connected network (FCN), improvements have been achieved in designing sophisticated network architectures, involving more trainable parameters and the optimisation for deep neural networks.

The advancements in aforementioned aspects have led to the establishment of further large-scale datasets to be analysed. In return, the availability of large-scale datasets has also facilitated the development of more sophisticated network architectures. The alternate improvements in both aspects have encouraged prosperous applications in semantic segmentation, such as autonomous driving, medical image analysis, and image editing. However, the effectiveness of semantic segmentation is still limited in more challenging scenarios.

In remotely sensed image semantic segmentation, the challenges arise from limited accessibility of high-quality annotated data, multi-modality inconsistencies between RGB and multi-spectral images, and the fragileness against lighting conditions. To overcome these weaknesses, prevalent methods proposed to incorporate multi-spectral information into the segmentation process. Nevertheless, the importance of multi-spectral information and RGB information are assumed equal or manually balanced and are thus not well-explored.

In weakly supervised semantic segmentation, the partial or incomplete annotations are used to train the segmentation model, leading to the difficulty in learning discriminative and comprehensive feature representations. Existing methods seek to incorporate additional supervision from saliency maps, sub-class information, or boundary information to enhance the learned feature representation. Though the features are enriched to some extent, the majority of previous methods focus only on a single feature extraction stage and none of them has paid holistic attention to features derived from coarse to fine levels.

Video instance segmentation poses a more challenging problem than still image segmentation, due to the unreliable predictions under motion blur, occlusions, and rapid appearance changes. Overlooking the uncertainty of model predictions, established methods tend to reuse the same training strategy as in image segmentation, causing less efficient use of temporal information, unreliable features learned and inferior segmentation results.

This thesis tackles these challenges through multi-channel feature fusion, comprehensive contrastive learning, and probabilistic modelling, respectively, aiming at improving the effectiveness of feature extraction and reconstruction for image semantic segmentation. In remote sensing water body segmentation, the proposed MC-WBDN model learns the fusion of multi-spectral features in an end-to-end fashion, demonstrating higher robustness against variations in weather conditions and exhibits superior performance in distinguishing small-scale water bodies. For weakly supervised semantic segmentation, the proposed MuSCLe framework significantly improves the generation of pseudo labels and the final segmentation performance, by enriching the feature representations from low-, medium-, and high- semantic levels using contrastive learning goals. In addition, a probabilistic model, comprising a memory network and conditional neural processes, is established. This model facilitates reliable sample selection for training, and consequently, the model not only exhibits superior performance in segmentation, but also gains an insightful understanding of the data reliability in multi-task video instance segmentation.

Through three pieces of work, this thesis showcases the significance of encoder feature extraction, decoder feature restoration, and the integration of intermediate feature assessment in achieving optimal semantic segmentation. The proposed methods are evaluated both qualitatively and quantitatively on several challenging scenarios, conclusively demonstrating their efficacy.

History

School

  • Science

Department

  • Computer Science

Publisher

Loughborough University

Rights holder

© Kunhao Yuan

Publication date

2023

Notes

A Doctoral Thesis. Submitted in partial fulfilment of the requirements for the award of the degree of Doctor of Philosophy of Loughborough University.

Language

  • en

Supervisor(s)

Hui Fang ; Gerald Schaefer ; Lin Guan

Qualification name

  • PhD

Qualification level

  • Doctoral

This submission includes a signed certificate in addition to the thesis file(s)

  • I have submitted a signed certificate