Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
Download file
10 files

Wikidata Constraints Violations - July 2018

posted on 2019-02-14, 10:58 authored by Thomas Pellissier TanonThomas Pellissier Tanon, Camille Bourgaux
This dataset contains corrections for Wikidata constraint violations extracted from the July 1st 2018 Wikidata full history dump.

The following constraints are considered:
* conflicts with:
* distinct values:
* inverse and symmetric:
* item requires statement:
* one of:
* single value:
* type:
* value requires statement:
* value type:

The constraints.tsv file contains the list of most of the Wikidata constraints considered in this dataset (beware, there could be some discrepancies for type, valueType, itemRequiresClaim and valueRequiresClaim constraints).
It is a tabbed-separated file with the following columns:
* constrain id: the URI of the Wikidata statement describing the constraint
* property id: the URI of the property that is constrained
* type id: the URI of the constraint type (type, value type...). It is a Wikidata item.
* 15 columns for the possible attributes of the constraint. If an attribute has multiple values, they are in the same cell but separated by a space. The columns are:
** regex:
** exceptions:
** group by:
** items:
** property:
** namespace:
** class:
** relation:
** minimal date:
** maximum date:
** maximum value:
** minimal value:
** status:
** separator:
** scope:

The other files provide for each constraint type the list of all corrections extracted from the edit history. The format of the file is one line per correction with the following tabbed-separated values:
* URI for the statement describing the constraint in Wikidata
* URI of the revision that has solved the constraint violation
* subject, predicate and object of the triple that was violating the constraint (separated by a tab)
* the string "->"
* subject, predicate and object of the triple(s) of the correction, each followed by "" if the triple has been removed or "" if the triple has been added. Each component of these values is separated by a tab.

More detailed explanations are provided in a soon to be published paper