Algorithmic strategies for FPGA-based vision

2016-11-29T04:26:52Z (GMT) by Lim, Yoong Kang
As demands for real-time computer vision applications increase, implementations on alternative architectures have been explored. These architectures include Field-Programmable Gate Arrays (FPGAs), which offer a high degree of flexibility and parallelism. A problem with this is that many computer vision algorithms have been optimized for serial processing, and this often does not map well to FPGA implementation. This thesis introduces the concept of FPGA-tailored computer vision algorithms, particularly on a stream processing mode. Case studies on FPGA implementations of standard corner detections (Harris, FAST and SUSAN) were carried out and analyzed to highlight the differences between hardware and software. Through this analysis, it was observed that an efficient software algorithm may not retain its speed advantage in the hardware domain. In fact, algorithms that are slower in software, can achieve comparable or faster performance in the hardware domain with the appropriate implementation compared to algorithms optimized for serial processing. Other observations include the optimization goals for FPGA implementation, the opportunities present in FPGAs that can be exploited, and properties of algorithms that are suitable and unsuitable for FPGA implementation. The outcome is a set of guidelines and principles for an FPGA-tailored algorithm. This information is then used in the design of a face detection algorithm optimized for FPGA implementation. This algorithm was deliberately designed to use operations suitable for FPGAs, based on the insights gained from the corner detection case studies. The result is a face detection algorithm that is unattractive as a software implementation, but is a reasonable choice as an FPGA implementation. The FPGA implementation of this algorithm achieves high theoretical framerates, and is implementable on a low-cost, low-end FPGA development board. This implementation is also competitive with FPGA implementations of the software-optimized Viola-Jones algorithm, especially on lower-end devices.