figshare
Browse

PRIPA: A Tool for Privacy-Preserving Analytics of Linguistic Data

Download (388.57 kB)
conference contribution
posted on 2025-02-27, 08:34 authored by Jeremie ClosJeremie Clos, Emma Mcclaughlin, Pepita Barnard, Elena NicheleElena Nichele, Dawn KnightDawn Knight, Derek McAuley, Svenja Adolphs

The days of large amorphous corpora collected with armies of Web crawlers and stored indefinitely are, or should be, coming to an end. There is a wealth of hidden linguistic information that is increasingly difficult to access, hidden in personal data that would be unethical and technically challenging to collect using traditional methods such as Web crawling and mass surveillance of online discussion spaces. Advances in privacy regulations such as GDPR and changes in the public perception of privacy bring into question the problematic ethical dimension of extracting information from unaware if not unwilling participants. Modern corpora need to adapt, be focused on testing specific hypotheses, and be respectful of the privacy of the people who generated its data. Our work focuses on using a distributed participatory approach and continuous informed consent to solve these issues, by allowing participants to voluntarily contribute their own censored personal data at a granular level. We evaluate our approach in a three-pronged manner, testing the accuracy of measurement of statistical measures of language with respect to standard corpus linguistics tools, evaluating the usability of our application with a participant involvement panel, and using the tool for a case study on health communication.


Funding

This research is funded by the Arts and Humanities Research Council (AHRC), grant reference AH/V015125/1 supported by the Horizon Digital Economy Research Institute.

History

School affiliated with

  • Lincoln Business School (Research Outputs)

Publication Title

LREC 2022, Thirteen International Conference on Language Resources and Evaluation, LREC 2022 Conference Proceedings

Pages/Article Number

73‑78

Publisher

European Language Resources Association (ELRA)

ISBN

979-10-95546-96-2

Date Accepted

2022-06-01

Date of Final Publication

2022-06-25

Event Name

Language Resources and Evaluation Conference (LREC 2022)

Event Dates

20-25 June 2022

Open Access Status

  • Open Access

Will your conference paper be published in proceedings?

  • Yes