figshare
Browse
Poster_HuMaIN_SI2PI2020_final_Fortes.pdf (824.33 kB)

HuMaIN: Human- and Machine-Intelligent Network of Software Elements

Download (824.33 kB) This item is shared privately
poster
modified on 2020-01-31, 15:56

Biodiversity information extraction (IE) from imaged text in digitized museum specimen records is a challenging task due to both the large number of labels and the complexity of the characters and information to be extracted.

The HuMaIN project investigates software-enabled solutions that support the combination of machine and human intelligence to accelerate IE from specimen labels.

Among other contributions, the project proposed the use of self-aware workflows to orchestrate machines and human tasks (the SELFIE model), Optical Character Recognition (OCR) ensembles and Natural Language Processing (NLP) methods to increase confidence in extracted text, named-entity recognition (NER) techniques for Darwin Core (DC) terms extraction, and a simulator for the study of these workflows with real-world data. The software has been tested and applied on large datasets from museums in the USA and Australia.

Funding

SI2-SSE: Human- and Machine-Intelligent Software Elements for Cost-Effective Scientific Data Digitization

Directorate for Computer & Information Science & Engineering

Find out more...