figshare
Browse

Borrowed Voices, Shared Debt: Plagiarism, Idea Recombination, and the Knowledge Commons in Large Language Models

Download (855.68 kB)
dataset
posted on 2025-09-16, 12:20 authored by Agustin V. StartariAgustin V. Startari
<p dir="ltr">Large language models generate fluent text by recombining the language and ideas of prior authors at scale. This process produces plagiarism-like harms in three dimensions: direct wording leakage, imitation of distinctive styles, and appropriation of argument structures or conceptual syntheses without provenance. At the same time, their capacity to provide insight or novel-seeming combinations depends entirely on the accumulated labor of millions of human writers, editors, teachers, and curators who built the knowledge commons. This paper argues that denunciation and recognition must proceed together: the harms of extraction must be exposed, yet the debt to the commons must also be acknowledged. The article proposes a framework that defines the scope of plagiarism in this context, diagnoses the mechanisms of recombination, and sets out operational remedies, including dataset governance, attribution layers, compensation pools, and measurable audit thresholds. The goal is to establish a system that restricts illegitimate appropriation while reinvesting in the infrastructures of shared knowledge that make such synthesis possible.</p><p dir="ltr"><b>DOI</b></p><ul><li>Primary archive: <a href="https://doi.org/10.5281/zenodo.17132004" target="_blank">https://doi.org/<b>10.5281/zenodo.17132004</b></a></li><li>Secondary archive: <a href="https://doi.org/10.6084/m9.figshare.30137422" target="_blank">https://doi.org/10.6084/m9.figshare.30137422</a></li><li>SSRN: Pending assignment (ETA: Q3 2025)</li></ul><p></p>

History

Usage metrics

    Categories

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC