Dense Face Detection via High-level Context Mining
The appearance degradation caused by low resolution is the core problem of small face detection. Therefore, a natural approach is to assemble information from the context. This paper focuses on how to use high-level contextual information to improve the abilities of anchor-based detectors to detect dense and degenerate faces. We tap the spatial contextual information on the overall view based on the density map, and propose the prior of face co-occurrence for inferred bounding-boxes coordination. We also propose score-size-specific non-maximum suppression to replace the traditional non-maximum suppression at the end of anchor-based detectors. According to the inferred face boxes' quantity, score and size, the proposed synthetical solution reduces false positives and increases true positives. Our method does not require additional training, which is model-independent and can be embedded into existing face detectors. We also propose a dataset - Crowd Face for face detection, which is full of challenges. We expect to supply enough samples to highlight the difficulties of detecting dense and degenerate faces. We embed our proposed methods into state-of-the-art face detectors on massively benchmarked face datasets. Compared with the prior art on the WIDER FACE hard set, our method increase an Average Precision of 0.1 %-1.3%. On Crowd Face, it increases an Average Precision of 1 % – 6%. Dataset is available on: https://github.com/QxGeng/Crowd-Face.