figshare
Browse

Span-Based Information Extraction and Beyond

Download (2.01 MB)
dataset
posted on 2025-05-07, 17:51 authored by Yifan Ding
Span-based information extraction (SIE) is a set of natural language processing and information extraction tasks which aim to extract the span of interest from digital text and assign corresponding span classes that describe the nature of that text. SIE is essential yet challenging. On one hand, the development of SIE directly reflects natural language processing especially on text understanding. On the other hand, SIE can link digital text to knowledge base and knowledge graph entries, which can enhance the background information of the highlighted text. In this thesis, I focus on SIE tasks with four parts. (1) Foundations of Span-based Information Extraction. This section outlines the concepts and history of this task. (2) Models of Span-based Information Extraction. This section introduces our presented three SIE models including Ask-and-Verify, EntGPT, and G3. (3) Applications of Span-based Information Extraction. This section introduces two applications of SIE including SIE for multi-choice question answering and SIE to enhance trust of plain text. (4) Limitations and Future Work Beyond Span-based Information Extraction. This section covers limitations of SIE and some directions for future work.

History

Date Created

2025-04-08

Date Modified

2025-05-07

Defense Date

2025-01-20

CIP Code

  • 14.0901

Research Director(s)

Tim Weninger

Committee Members

Meng Jiang Xiangliang Zhang Luna Dong

Degree

  • Doctor of Philosophy

Degree Level

  • Doctoral Dissertation

Language

  • English

Library Record

006700758

OCLC Number

1518701250

Publisher

University of Notre Dame

Additional Groups

  • Computer Science and Engineering

Program Name

  • Computer Science and Engineering