figshare
Browse
DATASET
contradict_features.csv (607.47 kB)
DATASET
disputed_features.csv (1.81 MB)
DATASET
third-party_features.csv (1.29 MB)
DATASET
pov_features.csv (1.37 MB)
DATASET
hoax_features.csv (362.95 kB)
DATASET
unreliable-sources_features.csv (1.86 MB)
DATASET
more-citations-needed_features.csv (3.6 MB)
DATASET
original-research_features.csv (5.11 MB)
DATASET
one-source_features.csv (6.51 MB)
DATASET
unreferenced_features.csv (101.45 MB)
.GZ
contradict_difftxt.csv.gz (3.17 MB)
.GZ
disputed_difftxt.csv.gz (12.75 MB)
.GZ
hoax_difftxt.csv.gz (817.96 kB)
.GZ
more-citations-needed_difftxt.csv.gz (27.69 MB)
.GZ
one-source_difftxt.csv.gz (21.98 MB)
.GZ
original-research_difftxt.csv.gz (52.79 MB)
.GZ
pov_difftxt.csv.gz (9.22 MB)
.GZ
third-party_difftxt.csv.gz (8.12 MB)
.GZ
unreferenced_difftxt.csv.gz (265.78 MB)
.GZ
unreliable-sources_difftxt.csv.gz (12.31 MB)
1/0
30 files

Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia

Version 4 2021-03-02, 15:21
Version 3 2021-02-28, 16:03
Version 2 2021-02-26, 03:15
Version 1 2021-02-26, 03:10
dataset
posted on 2021-03-02, 15:21 authored by KayYen WongKayYen Wong, Diego Saez-TrumperDiego Saez-Trumper, Miriam RediMiriam Redi
Wiki-Reliability: Machine Learning datasets for measuring content reliability on Wikipedia

Consists of metadata features and content text datasets, with the formats:
- {template_name}_features.csv
- {template_name}_difftxt.csv.gz
- {template_name}_fulltxt.csv.gz

For more details on the project, dataset schema, and links to data usage and benchmarking:
https://meta.wikimedia.org/wiki/Research:Wiki-Reliability:_A_Large_Scale_Dataset_for_Content_Reliability_on_Wikipedia

History