Span-based information extraction (SIE) is a set of natural language processing and information extraction tasks which aim to extract the span of interest from digital text and assign corresponding span classes that describe the nature of that text. SIE is essential yet challenging. On one hand, the development of SIE directly reflects natural language processing especially on text understanding. On the other hand, SIE can link digital text to knowledge base and knowledge graph entries, which can enhance the background information of the highlighted text.
In this thesis, I focus on SIE tasks with four parts. (1) Foundations of Span-based Information Extraction. This section outlines the concepts and history of this task.
(2) Models of Span-based Information Extraction. This section introduces our presented three SIE models including Ask-and-Verify, EntGPT, and G3. (3) Applications of Span-based Information Extraction. This section introduces two applications of SIE including SIE for multi-choice question answering and SIE to enhance trust of plain text. (4) Limitations and Future Work Beyond Span-based Information Extraction. This section covers limitations of SIE and some directions for future work.