<p dir="ltr">Structural health monitoring and infrastructure condition assessment play an important role in improving the sustainability and resilience of the infrastructures. Compared to traditional manual inspection, which is labor-intensive and time-consuming, autonomous robotic inspection, driven by the power of computer vision, emerges as a swifter, more efficient, and non-destructive approach. This thesis focuses on autonomous condition assessment of civil infrastructures leveraging computer vision, deep learning, and robotic control. The first part is a cascaded deep convolutional neural network (CNN) that detects cracks more robustly in high-resolution images by harnessing the power of Bayesian data fusion. The performance of the model is evaluated on high-resolution images taken during bridge girder inspections. The proposed approach can segment out the thin cracks accurately and, at the same time, reduce the false positive predictions with the Bayesian data fusion. The second part of the thesis proposes a channel-wise attention mechanism and incorporates it into DeepLabV3+ and U-Net++, and improves the performance of structural component segmentation, damage state segmentation, and detailed damage segmentation. Furthermore, a semi-supervised learning (SSL) framework is proposed to address the challenge of insufficient data availability. The third part of the thesis focuses on autonomous robotic inspection based on active perception and deep reinforcement learning (DRL). Inspired by the behavior of human inspectors, the proposed robotic system can disambiguate the uncertain damages by actively selecting the next best viewpoint to visit. The results show that the proposed robotic system can effectively reduce false positive predictions and significantly shorten inspection times when compared to conventional raster scanning methods. The fourth part of the thesis focuses on improving the active perception agent by introducing uncertainty estimation in the perception module, a Transformer policy network with episodic memory structure, and a hierarchical action head. The experiments are carried out on a more complex and challenging simulation environment that considers 3D camera movement, multi-class damage, and more challenging lighting conditions compared with the previous study. The results demonstrate that the proposed modules substantially enhance damage detection performance, achieving higher accuracy and reduced inspection time compared to prior work.</p>