figshare
Browse

Simple Italian and AI: strenghts and weaknesses

Version 2 2025-03-26, 15:03
Version 1 2025-03-25, 21:32
software
posted on 2025-03-26, 15:03 authored by Giuliana Fiorentino, Vittorio Ganfi, Marco RussodivitoMarco Russodivito, Alessandro Cioffi, Maria Ausilia Simonelli
The simplification of language – in particular with reference to administrative language – is a topic that has been addressed in Italian linguistics for several decades and that has achieved some important results (consolidated and shared lists of linguistic factors – morphosyntactic and lexical – that affect the simplicity and accessibility of a text; for a summary see Fiorentino/Ganfi 2024), which have allowed the definition of a readability index (Gulpease) as early as the 1980s (Lucisano/Piemontese 1988).

The authors (a research group) are currently realizing – with the support of a large language model (LLM) – an application for the automatic simplification of administrative texts called SEMPL-IT (Fiorentino/Russodivito, in press; Ganfi/Russodivito in press). To develop this objective, ItaIst was set up, a corpus of 208 administrative texts from 8 Italian regions (Basilicata, Calabria, Campania, Latium, Lombardy, Molise, Tuscany, Veneto) and referring to 3 thematic areas: waste, health, public services. For each thematic area, 2 types of texts were considered (service charters and calls for tenders for the first thematic area; general planning acts and accreditations for the second thematic area; service charters and rationalization of public participations for the third thematic area).

The corpus was then automatically simplified to create a simplified parallel corpus that was compared with the source corpus. The simplified parallel corpus was then evaluated from the point of view of increased readability and semantic similarity to the source text in order to validate the automatic simplification work.

In this contribution, we intend to apply the same automatic simplification model to another corpus – called ItaRegol – of texts different from those used in the previous studies in order to compare the simplification results with those already obtained. The corpus ItaRegol is smaller in size than ItaIst and consists of rules and regulations. This corpus takes into account legally relevant acts with legal effects, which create, modify or extinguish subjective legal situations. These texts are particularly complex, for which simplification must ensure that the process of linguistic manipulation does not affect the legal effect.

In sum, in this contribution we will discuss the simplification parameters used, the quality of the simplified text, and draw conclusions on the different impact of the various parameters in increasing the readability of administrative and/or regulatory text.

Funding

VerbACxSS

Ministry of Education, Universities and Research

Find out more...

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC