3D pose estimation for bin-picking: A data-driven approach using multi-light images

2018-10-22T20:18:01Z (GMT) by José Jerónimo Moreira Rodrigues
We study the problem of 3D pose estimation of textureless shiny objects from monocular 2D images, for a bin-picking task. The main challenge of dealing with a shiny object comes from the fact that the object appearance largely changes with its pose and illumination. Therefore, conventional 3D-2D correspondence search usually<br>fails due to the inconsistency of feature descriptors. For a textureless object like a mechanical part, visual feature matching becomes even harder due to the absence of<br>stable texture features. Hierarchical template matching approaches require a larger number of templates to be matched when dealing with shiny objects, due to the drastic<br>appearance changes with pose. In the challenging scenario of a bin-picking task, we must also cope with partial occlusions, shadows and inter-reflections, requiring<br>redoubled eff ort in matching each template to obtain reliable results, which compromises the attractiveness of such approaches that are usually popular for textureless<br>objects. In this thesis, we develop a purely data-driven method to tackle the pose estimation problem. Motivated by photometric stereo, we develop an imaging system with<br>multiple lights to acquire a multi-light image where channels are obtained by varying illumination directions. In an oine stage, we capture multi-light images of a given<br>object in several poses. Then, we use random ferns to cluster the appearance of small patches of the multi-light images, and we store in each cluster the information of possible object poses. At run-time, the patches of the input multi-light image use the clusters information to probabilistically vote on several pose hypotheses. Since our<br>pose hypotheses are a discrete set, we re fine the discretized pose into the continuous space, in order to obtain accurate object poses for robotic manipulation.<br>Experiments show that the given method can detect and estimate poses of textureless and shiny objects accurately and robustly within half a second. We further<br>compare our approach with the HALCON commercial software, a highly optimized hierarchical template matching approach developed by MVTec, and show some of<br>the drawbacks of such type of approaches. Finally, we run detection on a different object by simply changing the image database. <br>