Span-Based Information Extraction and Beyond

dataset

posted on 2025-05-07, 17:51 authored by Yifan Ding

Span-based information extraction (SIE) is a set of natural language processing and information extraction tasks which aim to extract the span of interest from digital text and assign corresponding span classes that describe the nature of that text. SIE is essential yet challenging. On one hand, the development of SIE directly reflects natural language processing especially on text understanding. On the other hand, SIE can link digital text to knowledge base and knowledge graph entries, which can enhance the background information of the highlighted text. In this thesis, I focus on SIE tasks with four parts. (1) Foundations of Span-based Information Extraction. This section outlines the concepts and history of this task. (2) Models of Span-based Information Extraction. This section introduces our presented three SIE models including Ask-and-Verify, EntGPT, and G3. (3) Applications of Span-based Information Extraction. This section introduces two applications of SIE including SIE for multi-choice question answering and SIE to enhance trust of plain text. (4) Limitations and Future Work Beyond Span-based Information Extraction. This section covers limitations of SIE and some directions for future work.

History

Date Created

2025-04-08

Date Modified

2025-05-07

Defense Date

2025-01-20

CIP Code

14.0901

Research Director(s)

Tim Weninger

Committee Members

Meng Jiang Xiangliang Zhang Luna Dong

Degree

Doctor of Philosophy

Degree Level

Doctoral Dissertation

Language

English

Library Record

006700758

OCLC Number

1518701250

Publisher

University of Notre Dame

Additional Groups

Computer Science and Engineering

Program Name

Computer Science and Engineering

Usage metrics

Keywords

Information Extraction Entity Disambiguation Entity Linking Attribute Value Extraction Natural Language Processing Pretrained Language Model Large Language Model Conceptualization English

Licence

CC BY 4.0