Bias and beyond in digital trace data

Malik, Momin

doi:10.1184/R1/7188947.v1

Malik_thesis_final (2).pdf (14.83 MB)

Bias and beyond in digital trace data

thesis

posted on 2018-08-01, 00:00 authored by Momin MalikMomin Malik

Large-scale digital trace data from sources such as social media platforms, emails, purchase records, browsing
behavior, and sensors in mobile phones are increasingly used for business decision-making, scientific
research, and even public policy. However, these data do not give an unbiased picture of underlying phenomena.
In this thesis, I demonstrate some of the ways in which large-scale digital trace data, despite its
richness, has biases in who is represented, what sorts of actions are represented, and what sorts of behaviors
are captured. I present three critiques, demonstrating respectively that geotagged tweets exhibit heavy geographic
and demographic biases, that social media platforms’ attempts to guide user behavior are successful
and have implications for the behavior we think we observe, and that sensors built into mobile phones like
Bluetooth and WiFi measure proximity and co-location but not necessarily interaction as has been claimed.
In response to these biases, I suggest shifting the scope of research done with digital trace data away from attempts
at large-sample statistical generalizability and towards studies that situate knowledge in the contexts
in which the data were collected. Specifically, I present two studies demonstrating alternatives to complement
each of the critiques. In the first, I work with public health researchers to use Twitter as a means of
public outreach and intervention. In the second, I design a study using mobile phone sensors in which I
use sensor data and survey data to respectively measure proximity and sociometric choice, and model the
relationship between the two.

History

Date

2018-08-01

Degree Type

Dissertation

Department

Institute for Software Research

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Jürgen Pfeffer Anind K. Dey

Usage metrics

Keywords

Computational social science bias generalizability validity digital trace measurement

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Bias and beyond in digital trace data

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports