HuMaIN: Human- and Machine-Intelligent Network of Software Elements
Biodiversity information extraction (IE) from imaged text in digitized museum specimen records is a challenging task due to both the large number of labels and the complexity of the characters and information to be extracted.
The HuMaIN project investigates software-enabled solutions that support the combination of machine and human intelligence to accelerate IE from specimen labels.
Among other contributions, the project proposed the use of self-aware workflows to orchestrate machines and human tasks (the SELFIE model), Optical Character Recognition (OCR) ensembles and Natural Language Processing (NLP) methods to increase confidence in extracted text, named-entity recognition (NER) techniques for Darwin Core (DC) terms extraction, and a simulator for the study of these workflows with real-world data. The software has been tested and applied on large datasets from museums in the USA and Australia.
Funding
SI2-SSE: Human- and Machine-Intelligent Software Elements for Cost-Effective Scientific Data Digitization
Directorate for Computer & Information Science & Engineering
Find out more...