Computer Vision Techniques in Manufacturing

Computer vision (CV) techniques have played an important role in promoting the informatization, digitization, and intelligence of industrial manufacturing systems. Considering the rapid development of CV techniques, we present a comprehensive review of the state of the art of these techniques and their applications in manufacturing industries. We survey the most common methods, including feature detection, recognition, segmentation, and three-dimensional modeling. A system framework of CV in the manufacturing environment is proposed, consisting of a lighting module, a manufacturing system, a sensing module, CV algorithms, a decision-making module, and an actuator. Applications of CV to different stages of the entire product life cycle are then explored, including product design, modeling and simulation, planning and scheduling, the production process, inspection and quality control, assembly, transportation, and disassembly. Challenges include algorithm implementation, data preprocessing, data labeling, and benchmarks. Future directions include building benchmarks, developing methods for nonannotated data processing, developing effective data preprocessing mechanisms, customizing CV models, and opportunities aroused by 5G.

computing capabilities, until the 1990s [4]. Opportunities for CV in manufacturing can be grouped into three broad categories: 1) image-based metrology; 2) manufacturing process interpretability; and 3) material structure analysis [5]. From the perspective of application scenarios, industrial applications of CV technologies can be classified into: visual inspection, part identification, process control, and robotic guidance mechanisms [6]. In this survey, our classification framework is based on different stages of the product life cycle in the entire manufacturing process.
Existing review papers on this topic usually specialize in just one of the manufacturing industries, such as the steel industry. For example, Aldrich et al. [7] reviewed some available CV-based froth imaging systems supported by CV technologies. In [8], vision-based defect detection and classification of steel surfaces were surveyed for steel mill systems. Wang [9] reviewed the literature of weld pool state sensing, including conventional sensing, vision sensing, and multisensor information fusion technologies, emphasizing threedimensional (3-D) vision sensing approaches. These reviews have since become outdated because CV techniques have continued to develop at an increasingly rapid pace in recent years. Newly proposed technologies regularly encourage the updating of manufacturing systems. It is necessary to make a new survey of the state of the art of CV techniques and their applications in different manufacturing tasks. The reason why we conduct a literature review rather than a systematic review can be found in Appendix A.
It is difficult to include all literature related to this topic because of the sheer size of the research community. As such, here we make an effort to cover the majority of important methods proposed in the past few decades and discuss the latest results. Our literature search was mainly conducted based on three publication libraries: 1) Scopus 1 ; 2) Google Scholar 2 ; and 3) IEEE Xplore Digital Library 3 and also from additional important journals within this scope. The search range of publication years is from 1970 to 2020. The keywords and inclusion/exclusion criteria of the search query we applied can be found in Appendix B. In this article, the methodology of the literature review is as follows.
1) Reviewing the important CV techniques, including feature detection, recognition, segmentation, and 3-D modeling. 2) Discussing the system framework of CV in the manufacturing environment, including a lighting module, a manufacturing system, a sensing module, CV algorithms, a decision-making module, and an actuator. 3) Surveying the newest and widely implemented CV applications for different stages in the entire product life cycle, including product design, modeling and simulation, planning and scheduling, production process, inspection and quality control, assembly, transportation, and disassembly. In addition, we present a critical analysis of challenges and an outlook of future directions. The remainder of this article is organized as follows. Section II reviews the recent results of important CV techniques, including feature detection, recognition, segmentation, and 3-D modeling. In Section III, we propose a manufacturingoriented CV system framework to present how CV function can be embedded into complex manufacturing systems. In Section IV, CV applications for different stages in the entire product life cycle are surveyed. Section V presents a critical analysis of challenges and an outlook of future directions. Finally, conclusions are given in Section VI.

II. COMPUTER VISION TECHNIQUES
The purpose of a CV system is to generate a symbolic description of what is being imaged in a scene [10]. This description includes understanding of the scene and then be applied to guide the next operations of the robot system. There are multiple kinds of tasks and algorithms in the CV field, such as detection, recognition, segmentation, and 3-D reconstruction. In this section, we review the state of the art of several important CV techniques.

A. Feature Detection
Visual feature detection (e.g., point, edge, and line detection) is the basis of many other CV algorithms. In some cases, we are more interested in a specific region of the image, such as human eyes, license plates, and corner shapes, which are called keypoint features. In some other cases, we care more about the edge features of objects in an image. A few CV algorithms can be applied to identify and match both keypoint features and edge features among different images. Commonly used evaluation metrics for feature detection include mean absolute error (MAE), mean-squared error (MSE), peak signalto-noise ratio (PSNR), and structural similarity index measure (SSIM).
1) Keypoint Detection: There are mainly two types of keypoint detection methods: 1) local search detection and 2) global search detection. Common local search detection methods include correlation methods, least squares methods, and learning-based methods [11]. Local search detection methods are more suitable for scenes where images are taken continuously in a high frequency, such as video sequences. Different from local search methods, global search detection methods search the entire image and then match features based on feature appearance. Therefore, global search methods are more suitable for scenes where there are a lot of movements or appearance changes [12].
2) Edge Detection: Edges in images often appear at boundaries between different objects, resulting in sudden changes in color and intensity. One method of edge detection is based on the gradient through the image [13], even though the gradient is easily affected by noise. Hence, a low-pass filter (e.g., Gaussian filter) is needed to filter the image before gradient calculation. Horizontal and vertical convolution operations can be used for edge recognition. The zero-crossing method can also be used for edge detection in which zero points of a second-order derivative expression are searched to find edges [14]. Recently, deep-learning methods, especially convolutional neural networks (CNNs), have been widely applied to edge detection [15], [16].

B. Recognition
Recognition is another important task for CV techniques. From the perspective of target objects, recognition problems can be grouped into three categories: 1) instance recognition; 2) class recognition; and 3) general category recognition. We also discuss action recognition in videos which is the current challenging problem and future trend for recognition tasks. Commonly used evaluation metrics for recognition are accuracy, recall, precision, F1 Score, and ROC/AUC curves.
1) Instance Recognition: In instance recognition, the goal is to identify a specific known object. Feature matching strategies can be used for this recognition problem [17]. Other instance recognition methods include viewpoint-invariant feature-based strategies [18], and sparse feature matching [19]. A popular application of instance recognition is face recognition [20]. Some learning-based approaches have also been widely applied to this problem, such as support vector machines (SVMs) [21], boosting [22], and neural networks [23].
2) Class Recognition: Different from instance recognition, class recognition does not have a specific object as the target. In class recognition problems, the goal is usually to recognize the presence of an instance of a specific category of objects, such as cars or pedestrians. Class recognition problems can be considered as a specific classification problem in which the input is an image and the output is the classification of that image.
Throughout the development history of CV, the rise of CNNs has been one of the most important breakthroughs for the recent rapid development of CV technologies. In recent years, numerous CNNs have been proposed and applied in classification problems, and their models are getting deeper and deeper. Sorted from earlier to recent works, popular CNNs include LeNet-5 [24], AlexNet [25], VGG-16 [26], R-CNN [27], Fast R-CNN [28], inception networks [29], ResNet-50 [30], Xception [31], and ResNeXt-50 [32]. Before ResNet-50, most of the proposed CNNs only innovated by adding an increased number of layers in the network, alongside any other engineering changes needed for good performance. The "bottleneck" of this trend was that the model's accuracy gets saturated and then rapidly decreases as the network became deeper and deeper. The seminal residual network first inserted shortcut connections in its deep model to deal with this problem [30]. The object detection neural network Fast R-CNN trained the deep VGG16 network 9× faster than R-CNN and 213× faster at test time. Ren et al. proposed the Faster R-CNN [33] in which a region proposal network was used to share full-image convolutional features with the detection network. For the very deep VGG-16 model, Faster R-CNN had a frame rate of 5fps with 0.732 mAP accuracy on PASCAL VOC 2007 and 0.704 mAP accuracy on PASCAL VOC 2012.
3) General Category Recognition: General category recognition is the most challenging recognition problem because we need to identify not only the locations of different objects in the image but also what category each object belongs to. In this task, all different kinds of objects in the image need to be recognized. Common approaches of general category recognition include bag-of-words models and part-based models. In bagof-words models, the distribution of visual words in the target image is compared to the training data [34]. In the part-based models, different parts of an image are considered separately so as to determine whether and where an object of interest exists [35]. Recent advances in this task have taken advantage of deep residual learning [30] and deep neural networks [36].

4) Action Recognition:
The current challenging problem for the recognition task is action recognition. It is not difficult for humans to recognize actions in videos, but it is challenging for machines. Accurately recognizing the actions and behaviors in videos is of great significance for different scenarios, such as pedestrian motion monitoring in autonomous driving [37], and elderly fall detection [38]. In the manufacturing industry, identifying the behaviors of workers in the workshop has also been shown to be a very valuable approach to ensure production safety [39].

C. Segmentation
As one of the most classic tasks in the CV field, image segmentation aims to label pixels into different groups according to the objects that each pixel belongs to. Image segmentation can essentially be considered as a clustering problem. Early segmentation techniques usually used region division and merging methods [40]. Later segmentation algorithms applied indicators of consistency, such as intraregional consistency and interregional dissimilarity [41]. Other segmentation approaches include mean shift [42], graph-based merging [43], graph cut-based Markov [44], and level sets [45]. In the learning-based segmentation algorithms, one commonly used loss function is dice loss which is based on the dice coefficient. A dice coefficient can be defined as twice the true positive divided by the sum of twice the true positive, false positive, and false negative. Another commonly used loss function for deep-learning segmentation algorithms is the intersection over union (IoU).
The latest segmentation algorithms based on machine learning in recent years include Mask RCNN [46] and dual attention network [47]. Recently, U-Net has shown good performance in the segmentation tasks of medical images [48]. There are some variants of U-Net, such as Attention U-Net [49], U-Net++ [50], ResUNet++ [51], and TransUNet [52]. The performance of these algorithms not only depends on the algorithm design but also depends on the datasets. Current challenges and trends in segmentation include 3-D segmentation and 4-D segmentation problems. The goal of 3-D segmentation is to segment 3-D images in three spatial directions, while 4-D segmentation is to segment 4-D data which also includes the time dimension in addition to spatial dimensions. The 3-D U-Net has been applied to 3-D segmentation problems in medical imaging [53] and additive manufacturing defects in X-ray computed tomography (CT) images [54].

D. 3-D Modeling
3-D modeling in CV can be categorized into two problems: 1) stereo correspondence and 2) 3-D reconstruction. Stereo correspondence is the process of generating a 3-D model of an object from two or more images of the same object or scene, while the 3-D reconstruction is to generate a 3-D model of an object from only one image [55]. It is a challenge to design a good loss function to evaluate the predicted 3-D point cloud and ground truth. One option is evaluating how well the projections of predicted 3-D point clouds cover the groundtruth object's silhouette [56].
1) Stereo Correspondence: A common stereo correspondence method is to find matching pixels in multiple images and map their position in the 2-D images to 3-D positions in the 3-D model. Popular methods for stereo correspondence include epipolar geometry [57], sparse correspondence [58], and dense correspondence [59].
2) 3-D Reconstruction: The earliest approach for 3-D reconstruction is to predict object shape from visual shading, which was first proposed by Horn in 1970 [60]. Later, other "shape from X" methods were proposed, such as shape from texture [61] and shape from focus [62]. Other 3-D reconstruction methods include active rangefinding [63], and model-based reconstruction which has been widely applied to architectural 3-D modeling [64]. Recently, deep-learningbased algorithms promoted significant improvement in system performance of 3-D reconstruction [65], [66].

III. MANUFACTURING-ORIENTED COMPUTER VISION SYSTEM
In this section, we aim to discuss the role of CV in manufacturing systems and the closed loop between manufacturing environments and CV. The system framework of CV in manufacturing environments is proposed as shown in Fig. 1. There are optical devices (i.e., the lighting module and the sensing module), hardware (i.e., the manufacturing system and actuators), software (i.e., the CV system and the decision-making module), and data (e.g., optical images, feature descriptions, and decision signals) in this framework.
The lighting module is the light source to provide lighting for the manufacturing system so that the sensing module is able to capture images of the manufacturing system. One problem is how to design and set a suitable lighting module to improve the performance of the image sensing process. An important goal of the lighting module is to provide uniform illumination for the scene. Using multiple light sources showed better performance than a single light source for obtaining uniform illumination on the target scene [67]. To obtain uniform near-field irradiance, the angled LED ring array was applied by simplifying the nonrotational symmetric irradiance distribution [68].
In most cases, the sensing module of a CV system is one or multiple cameras, which can be categorized into fixed cameras and mobile cameras. Fixed cameras are usually placed on production lines, while mobile cameras are usually placed on robots, such as assembly robots and automated guided vehicles (AGVs). In order to acquire high-quality images even when there is an obstacle along the camera line of sight on production lines, an automatic camera placement strategy was presented [69]. Application of handheld CV devices can avoid maintaining a consistent camera distance and light source [70].
The CV system takes digital images captured through the sensing module as inputs and outputs detected features and descriptions of the images. Those popular CV algorithms which are discussed in Section II, such as feature detection, recognition, segmentation, and 3-D modeling algorithms can be applied here in the CV system.
Detection results of these CV algorithms are then applied to support the decision-making process. The decision-making action is actually the execution process of different rules and strategies in the decision-making module. Typical decision algorithms include types of priority-based rules [71], heuristic algorithms [72], and intelligent optimization algorithms [73]. Decision signals generated by the decision-making module then control the following actions of the actuator. The action sequence of the actuator reflects interactions between the actuator and the manufacturing system. CV techniques have also given significant support to the development of digital twin technologies in manufacturing systems [74]. Taking advantage of modeling and simulation technology, the digital twin is playing an increasingly important role in the informatization and intelligentization of manufacturing systems. CV techniques are used to give support to the operation and maintenance of digital twins in manufacturing environments. Using image processing and CV technologies, we are able to capture the imaging information of manufacturing systems in real time (such as the locations, movements, and defects of production parts) and extract abstract information of system states based on these images. The system state information helps to keep the state consistency between the digital twin and the real system. In addition to CV, there are other kinds of sensors, such as distance sensors, thermal sensors, and motion sensors in the manufacturing systems. Feedback from all these sensors and actuators is given to manufacturing and to be used on-demand.

IV. COMPUTER VISION APPLICATIONS IN MANUFACTURING
From the perspective of the entire product lifecycle, the manufacturing process can be briefly divided into multiple stages, including product design, modeling and simulation (M&S), planning and scheduling, production process, inspection and quality control, assembly, transportation, and disassembly. There is surely overlap between these stages, such as the inspection operations in production and assembly, and assembly operations in production processes. The statistics of keyword occurrence of different manufacturing stages in CV research papers in recent decades are shown in Fig. 2. It can be seen that CV techniques have been widely applied to inspection, production, and assembly in manufacturing industries. In this section, we not only survey the state of the art of CV techniques in all these different applications but also discuss the challenges in each direction.

A. Product Design, Modeling, and Simulation
As the first stage of the product life cycle, product design aims to create a new product or a new version of an existing product. The computer-aided design (CAD) and computeraided manufacturing (CAM) techniques have been widely used in the manufacturing industry for decades [75]. In CAD, computers are applied to support the creation, modification, analysis, and optimization of the product design. Product design results by CAD are usually in the form of 3-D models. These 3-D models then can be used for product prototyping, downstream analysis, and other manufacturing processes.
One of the common applications of CV in the product design is reconstructing 3-D models from 2-D images of existing products, such as geometric surface generation using range data [76], solid model generation from scanned data [77], and 3-D pose estimation using 2-D images and spatial information in CAD models [78].
Another important application of CV in the product design is the simulation and validation of product designs. M&S are large topics existing in different stages, especially when digital models are considered as "digital twins." In this section, we group together the product design, M&S because M&S techniques are mostly used in product design activities so far. M&S have been widely used in vehicle instrument testing, such as testing and validation of vehicle instrument cluster design by combining hardware-in-the-loop simulation and CV [79], [80].
With the rapid development of CAD software, multidisciplinary modeling-based product design has become the trend. Therefore, more and more products have their own 3-D digital models at earlier stages, which leads to a reduction in the demand for 3-D model reconstruction of products. How to find new applications of CV technology in product design and modeling is an interesting question.

B. Planning and Scheduling
After product design is validated, a production plan is to be made to identify what to do next, followed by schedules of when and how to execute the plan [81]. An important application of CV in lumber production is aiding the generation of lumber sawing plans because the internal textures of lumber affect sawing quality. Internal defects of lumber can be localized by analyzing lumber CT images [82], and these defect detections can then be used to formulate better sawing strategies [83]. Fig. 3 shows some localization results of knots and holes through a sequence of lumber CT slices.
Additionally, CV has also been used for production planning in additive manufacturing and computer numerical control (CNC), such as nesting irregular 3-D printing parts in the printing space [84], and generating path plans for golf-club head welding [85].
However, there are usually various uncertainties in real manufacturing workshops, such as machine breakdowns. A big challenge is how to update plans and schedules dynamically according to real-time vision information. Fig. 4. Scenario of human-robot collaborative assembly where the robot follows the operator's hand to offer assistance during a collaborative assembly operation [94].

C. Production Process
The production process is an important stage in the entire product life cycle. Although different industries have different forms of production, the basic process is similar which is transforming raw materials into products through a series of operations. Applications of CV in production mainly include part recognition and classification, production process monitoring, robot guidance, 3-D position measurement, and production safety monitoring.
1) Production Process Control: CV has been applied to control different production processes, such as trajectory control of molten rock for mineral wool production [86], and height and density measurement of the fibers [87]. In the iron industry, different features of bubbles, such as sizes, numbers, velocity, and stability were detected and analyzed based on froth images in flotation cells [88]. Vision-based tracking techniques can also support overcoming the difficulty of labeling and tracking steel materials due to high temperatures [89]. In the solar wafer production, the wavelet-based histogram matching approach in the spatial domain can be applied to extract pattern features of a multicrystalline solar wafer [90], and the features can then be used to control the flotation process [91].
2) Robot Guidance: Robot guidance and control is another important application of CV in production processes. Popular approaches include stereo vision and photogrammetry, projected texture stereo vision, time of flight, structured white light, structured blue LED light, light coding, laser triangulation, etc. These different approaches can be compared in terms of accuracy, range, safety, and processing time [92]. Additionally, the fuzzy logic-based controlling method is widely used in vision modules of robotics [93]. It is a critical task to build a safe stable collaboration environment between humans and robots. Virtual 3-D models of robots and real images of human operators can be combined to avoid collisions between humans and robots [94]. Fig. 4 shows a scenario of human-robot collaborative assembly where the robot follows the operator's hand to offer assistance.
3) Part Classification: Classification is a basic but important application of CV in production generally based on industrial robotics [95]. Traditional techniques include the Mahalanobis-Taguchi system and the principle component feature overlap measure [96]. Basic features (e.g., color and texture) can be used in product classification, such as wood, roofing shingles [97], olives [98], and sheet-like products [99]. Recent learning-based methods such as CNN are also widely applied in product classification.

4) 3-D Position Measurement:
CV also shows its capability in 3-D position measurements of parts, products, and tools. Early techniques mainly achieved the recognition of specific parts or features, such as locations of screw holes. Later methods include some contour-based approaches and expectation-maximization algorithms [100]. More recently, laser dynamic triangulation was applied for manufacturing robots to determine 3-D coordinates of objects [101].

5) Production Safety Control:
In workshops, SVM-based helmet identification can be applied to guarantee workers' safety [102]. In machining, collisions between tools and components can be avoided by checking whether the actual machining set-up is in conformity with the desired CAD model [103]. One of the challenges in production is how to quickly and accurately generate control instructions for the next action of actuators in an uncertain production environment. A set of possible situations and corresponding actions can be built, and then the current situation needs to be recognized using vision approaches. Possible applications include strain control instructions and safety alarms.

D. Inspection
CV has been widely applied to quality control in different manufacturing industries. Inspection is essential for quality control, including measurement, examination, and testing. The inspection process is to determine whether a part or product meets the requirements of quality by assessing some features of the object [104]. Some CV-based inspection results in different manufacturing industries are shown in Fig. 5, including mechanics, automotive, textile, and 3-D printing.
1) Mechanics: For mechanical machining, CV-based approaches have been applied to turned surface inspection [109], defect detection of spring clamps [110], damaged part detection [111], and remote quality inspection of production process [112]. Common inspection methods of automated assembly machines include blob analysis, optical flow, and running average [113]. And exceptions of powder spreading can be localized through segmenting powder bed images using CNN [105].
2) Automotive: In the automotive industry, CV techniques have been widely used for surface quality detection [114] and wheel alignment [115]. Image fusion algorithms (e.g., local directional blurring) can be applied to specular surface defect detections at smooth areas, edges, corners, and deep concavities [106]. Multiscale matrix fusion methods can be applied to detect potential defects from automobile images [116]. 3) 3-D Printing: CV has also been applied to printing quality inspection in 3-D printing processes, such as vision-based self-calibration of printheads [117] and printing error detection by superimposing virtual 3-D models to real objects [118]. 3-D printing defects can also be detected based on multiview methods which change the field of view in the printing progress [108]. Benchmark databases need to be developed in the future for CV-based 3-D printing applications [119].

4) Other Industries:
Other CV-based inspections include fiber defect detection (e.g., breaks, knots, thickness variations, and orientation) [107], electronics defect detection [120], and display defect detection [121]. CV has also been used for alignment inspection, such as optical component alignment [122], tile alignment [123], and raw part alignment in machining [124]. Besides, the anti-counterfeiting identity of products can be detected using vision-based matching algorithms to prevent mistakes [125].
Vision-based inspection systems usually capture large amounts of data. A critical challenge is how to effectively manage and utilize the data. Developing novel imaging devices and algorithms is essential for improving the performance of industrial inspection systems.

E. Assembly
Assembly, also referred to as progressive assembly, is an important process in product manufacturing, especially for discrete manufacturing (e.g., automotive and aviation). Usually, parts are assembled into semifinished products, and semifinished products are then assembled into final products.
1) Automatic Assembly: The goal of assembly in manufacturing is to achieve fully automatic assembly which is generally supported by vision systems. Main CV applications in the automatic assembly include motor stator assembly [126], cabin product assembly [127], assembly robotics for packaging [128], and automatic part picking and placing [129]. In printed circuit board assembly, shape-based recognition can be used to guide assembling flexible printed circuit cables onto hard disk drives [130].
2) Assembly Quality Control: Assembly error detection is also an important application of CV, such as assembly error prediction through statistical pattern recognition of geometric positions between mated parts and base parts [131], and fastener feature recognition [132]. In automotive assembly lines, human errors involved in the bolt securing process can be identified [133].
3) Other Assembly Applications: Augmented reality (AR) and CV have been combined for years to build interactive tools to guide assembly operations, such as human motion recognition in mechanical assembly operations [134] and the singleimage-based 3-D part assembly which involves challenges of ambiguity among parts and 3-D pose prediction [135]. Some results of single-image-based 3-D part assembly through different methods are shown in Fig. 6. There are some open imaging datasets of assembly processes, such as the dataset of pixel-level labeled images of hands performing different assembly tasks [136].
One of the challenges in automatic assembly is how to achieve safe, flexible, and intelligent human-robot cooperation in assembly. Latest deep reinforcement learning approaches are expected to be applied to machine training in various human-robot cooperation situations.

F. Transportation
Transportation activities in manufacturing can be either material transportation within a workshop or logistics between factories.
1) AGV: AGV, consisting of a navigation system, a power system, and a controlling system, has been widely applied in delivering materials, parts, and products. Earlier AGVs usually followed along marked lines on floors, while recent AGVs use radio, images, and lasers as their navigation signals. CV techniques are mainly applied in the navigation module of AGVs [137], such as path planning [138] and obstacle detection [139]. Traditional AGV navigation systems can be classified into two categories: 1) locally guided navigation (e.g., model-based approaches, downward vision systems, laser ranging systems, stacking vision, and deep learning [140]) and 2) remotely guided navigation (e.g., Webcontrolled vehicle systems [141]). SVM-based segmentation methods have also been used to distinguish the original color features of path images from their illumination artifacts [142].
2) Logistics: CV has not only been applied to AGVs but also some large-size industrial autonomous transportation systems, such as forklift trucks [143] and logistics trucks [144]. Covariance matrix algorithms can be applied in detecting and tracking moving objects for logistics systems [145]. Autonomous driving vehicles have also been designed for industrial logistics [146].
A big challenge of CV for transportation systems in manufacturing is the variation between images, such as shadows and highlights caused by complex illumination conditions. More effective denoising and normalization techniques need to be combined with CV algorithms in navigation modules to develop more robust transportation systems.

G. Disassembly
As a key step for the effective disposal of end-of-life products, disassembly should be included in the entire product life cycle. However, automatic product disassembly did not get much applied in actual manufacturing systems due to high costs and technical limitations until robotics and CAM techniques started to be rapidly developed recently [147]. Traditional manual disassembly was a labor-intensive task while robotic-based automated disassembly can greatly reduce costs [148]. According to the degree of automation, disassembly systems can be classified into semiautomatic disassembly and fully automatic disassembly systems. Vision systems are important for both disassembly sequence generation and disassembly robot control. It is a challenge to generate an effective disassembly sequence plan due to complicated constraints between components, especially for complex products. CV techniques can be applied to generate possible product assembly structures according to product images, supporting disassembly sequence generation [149]. CV can also be used as the visual recognition and navigation system of disassembly robots, such as disassembly robotics for automotive, electronics [150], and display products [151]. Some features of screws, such as gray scale, color scale, and depth can be used in screw detection to support automatic disassembly tasks [152]. Most current automatic disassembly systems are not fully automated, which means human participation is needed in the disassembly process. Therefore, how to achieve effective and safe human-robot interaction in the disassembly process is an important goal for current automatic disassembly systems [153].

A. Critical Analysis
Although CV techniques have been widely applied to almost every stage in the entire manufacturing process of different industries, there are still some long-standing issues and new challenges to be considered to achieve a better place. Based on the survey, we here give a critical analysis of the state of the art and challenges in CV applications for manufacturing industries in terms of implementation, data collection, data preprocessing, data labeling, and benchmarks.
1) Challenge of Implementation: The rapid development of CV technologies (e.g., CNNs and deep-learning models) has brought a lot of the latest exciting results to the community. However, the CV algorithms which are currently used in actual manufacturing systems are still those relatively classic algorithms, such as SVM and k-nearest neighbors algorithms (KNNs). The latest achievements of the CV community are not being quickly applied to the thorny problems facing real manufacturing environments. One possible reason is that largescale companies usually have sufficient resources to implement the latest research results, but meanwhile, the manufacturing systems of these large-scale companies are more complicated, involving different software systems, hardware systems in different departments (e.g., research and development, design, production, testing, sales, etc.). This complexity brings more challenges to the application of the latest CV research results in the actual manufacturing environments. Another possible reason is that research activities conducted by the CV community are usually based on ideal problem models with a large amount of labeled data, which is different from the issues in the actual manufacturing systems. This also leads to the difficulty of applying the latest research results.
2) Challenge of Data Collection: Data are always one of the most important parts for machine learning and CV tasks. Even though more data can be obtained with the rapid development of the Internet of Things (IoT) and sensor technologies, it is still a challenge to collect high-quality data, especially for 3-D surfaces and reflecting surfaces in some complex manufacturing environments with lighting problems. There are various reasons why data collection is challenging. One reason is the low-quality lighting problem in some complex manufacturing scenarios. Another possible reason is that the surface of the object is reflective resulting in the bias between the obtained images and the real object.
3) Challenge of Data Preprocessing: As more and more data sensing devices are deployed in manufacturing systems, a greater volume of structured and unstructured data (e.g., high-resolution images and videos) are collected from manufacturing production sites. However, not all the collected images and videos are worthy to be sent to CV systems for further processing. Concerning this demand, the challenge is the lack of effective data preprocessing mechanisms for a large amount of original manufacturing data. Some large-scale companies choose to temporarily store all collected data in databases for a period of time and delete the early data, such as the data from three months ago at a regular interval. This undoubtedly leads to much more expensive data storage and lower data processing efficiency.

4) Challenge of Data Labeling:
Although more and more high-quality visual data can be collected from the manufacturing sites, the collected raw data usually lacks the necessary labels which are important to supervised learning algorithms. Manually labeling large amounts of raw data is expensive. The current challenge is a lack of effective algorithms for handling nonlabeled data, as well as lacking methods to automatically label original visual data. In order to apply deep-learningbased vision technologies to different manufacturing scenarios where a large amount of image data can be obtained, more effort needs to be made in this regard. One possibility is recent self-supervised deep-learning methods that perform automatic labeling of unlabeled data such as SimCLR [154]. 5) Challenge of Benchmarks: There are already some taskoriented benchmarks, such as COCO, 4 the Multiple Object Tracking Benchmark, 5 and the UA-DETRAC Benchmark 6 to compare and evaluate newly proposed algorithms. But it is difficult to apply these benchmarks to specific manufacturing cases because these benchmarks are mostly designed for particular tasks, such as the detection of vehicles and pedestrians. Hence, more benchmarks for manufacturing applications are needed to be built for CV techniques to be further continuously applied in manufacturing industries. MVTec is a recent commonly used dataset for benchmarking anomaly detection methods with a focus on industrial inspection [155], [156].

B. Future Directions
Based on the above analysis of challenges, some future directions are proposed here as follows to accelerate the application of CV techniques in manufacturing industries.

1) Benchmarking:
One direction is to build benchmark standard datasets for specific tasks in the manufacturing environment to evaluate the performance of different applied CV algorithms. There are some steps needed to be taken to build the benchmarks for different CV tasks in manufacturing systems, such as deciding which CV task and which companies to be included, collecting and analyzing data, and making the metrics of performance measurement for CV algorithms and methods.
2) Big Data Preprocessing: It is also a trend to develop effective preprocessing mechanisms for specific image data, including data cleaning, and automatic or semiautomatic data annotation methods for original image data. It is difficult to establish fully automated equipment for different fields and different departments to achieve automated data cleaning and labeling. One strategy is to use the domain knowledge from experts in different departments to help with processing data in different structures and types. Another very important strategy is to establish the connection between the upstream production stages and the downstream data processing stage.
3) Nonsupervised Learning: Another direction is to develop effective methods that can process nonannotated manufacturing image data, so that deep-learning-based CV algorithms can be then applied to those manufacturing scenarios where a large amount of image data can be obtained. Possible directions to deal with this issue include newly developed machine learning approaches, such as one-shot learning, transfer learning, and semisupervised learning.

4) Task-Oriented Models:
It is also important to develop deep-learning-based CV models (e.g., CNNs) for future specific image processing tasks in manufacturing to improve the usefulness of manufacturing data and algorithm efficiency. This work generally requires significant training computations of different network structures and parameters to obtain satisfactory results. 5) 5G-Involved CV: The 5G communication technology, with its rapid development [157], will serve as one of the catalysts to provide new opportunities for the application of CV in manufacturing and bring new development directions for solving long-standing bottlenecks. How to optimize existing manufacturing CV systems considering 5G and how to design new CV applications based on the new 5G architecture to improve system performance are also interesting directions.

VI. CONCLUSION
Considering the rapid development of CV techniques, we presented a comprehensive review of several important CV techniques relevant to manufacturing, as well as their latest applications in different stages of the product life cycle within the entire manufacturing process. These surveyed CV techniques include feature detection, recognition, segmentation, and 3-D modeling. A system framework of CV in the manufacturing environment was proposed consisting of a lighting module, manufacturing system, sensing module, CV algorithms, decision-making module, and actuator. Applications of CV in different stages in the product life cycle were then surveyed, including product design, modeling and simulation, planning and scheduling, production process, inspection and quality control, assembly, transportation, and disassembly. Although CV techniques have been widely applied to almost every stage in the entire manufacturing process of different industries, there are still some long-standing issues and new challenges that are discussed in the critical analysis. Future development directions include building benchmarks for specific manufacturing image processing tasks, developing effective methods of processing nonannotated data, developing effective data preprocessing mechanisms (e.g., data cleaning, and automatic or semiautomatic data annotation methods), and developing CV models (e.g., CNNs) for specific manufacturing tasks to improve the usefulness of manufacturing data, the efficiency of CV algorithms, and new opportunities kindled by 5G.

APPENDIX A REASON WHY WE CONDUCT LITERATURE REVIEW RATHER THAN SYSTEMATIC REVIEW
We conduct a literature review because we try to present a critical analysis of existing research on the particular topic of CV techniques for manufacturing. Hence, we only include existing study results, but not any answers to a specific question or any new data, experiments, or unpublished material in any form. A systematic review would indeed focus on providing an in-depth and detailed review of existing literature on a specific topic and would address a specific, clearly formulated question with suitable responses. However, in a systematic review, some unpublished studies and reports may also be included. Therefore, a literature review is more suitable for our goal which is to help researchers stay updated about the latest research in this field and to identify gaps in the existing literature.

APPENDIX B KEYWORDS AND INCLUSION/EXCLUSION CRITERIA
OF THE PUBLICATION SEARCH QUERY APPLIED IN THIS ARTICLE Title-Abstract-Keywords (manufacturing OR industr* OR production OR machining OR inspection OR 'quality control' OR design* OR test* OR 'modeling and simulation' OR planning OR scheduling OR assembl* OR alignment OR disassembl* OR AGV OR transportation) AND Title-Abstract-Keywords ('CV' OR 'machine vision' OR 'robot vision' OR 'CNN').