figshare
Browse
reproduction_package.zip (34.66 MB)

Reproduction Package: Assessing the Utility of Text-to-SQL Approaches for Satisfying Software Developer Information Needs

Download (34.66 MB) This item is shared privately
software
modified on 2023-07-31, 12:39

Software analytics integrated with complex databases can deliver project intelligence into the hands of software engineering (SE) experts for satisfying their information needs. A new and promising machine learning technique known as text-to-SQL automatically extracts information for users of complex databases without the need to fully understand the database structure nor the accompanying query language. Users pose their request as so-called natural language utterance, i.e., question.However, it has not been studied yet, how well these text-to-SQL approaches satisfy SE experts information needs. Thus, Our goal was evaluating the performance and applicability of text-to-SQL approaches on data derived from tools typically used in the workflow of software engineers for satisfying their information needs. %during tasks resolution. We carefully selected and discussed five seminal as well as state-of-the-art text-to-SQL approaches and conduct a comparative assessment using the large-scale, cross-domain Spider dataset and the SE domain-specific SEOSS-Queries dataset. Furthermore, we study via a survey how SE professionals perform in satisfying their information needs and how they perceive text-to-SQL approaches.For the best performing approach, we observe a high accuracy of 96% in query prediction when training specifically on SE data. This accuracy is almost independent of the query's complexity. At the same time, we observe that SE professionals have substantial deficits in satisfying their information directly via SQL queries. Furthermore, SE professionals are open for utilizing text-to-SQL approaches in their daily work, considering them less time-consuming and helpful. We conclude that state-of-the-art text-to-SQL approaches are applicable in SE practice for day-to-day information needs.


If you use the SEOSS-Queries dataset, please cite as:

@article{TOMOVA2022108211,
title = {SEOSS-Queries - a software engineering dataset for text-to-SQL and question answering tasks},
journal = {Data in Brief},
volume = {42},
pages = {108211},
year = {2022},
issn = {2352-3409},
doi = {https://doi.org/10.1016/j.dib.2022.108211},
url = {https://www.sciencedirect.com/science/article/pii/S2352340922004152},
author = {Mihaela Todorova Tomova and Martin Hofmann and Patrick Mäder}
}