15 files


posted on 2024-06-13, 14:49 authored by Ignacio Echegoyen BlancoIgnacio Echegoyen Blanco, Belén Charro, María Angustias Roldán, María Pilar Martínez DíazMaría Pilar Martínez Díaz, Elena Gismero

Raw data.xlsx (as first obtained online)

PSD_DB.xlsx (variables relabeled, filtered unused cols)

secs_time_diffs.xlsx (questionnaire completion time differences)

Info DB curation:

1) NaN removal

2) IRV = 0 removal (only subjects with strictly irv = 0)

3) PMM regression to regress NaNs

db_pmm is the database with these three steps.

4) Long Strings: remove if ls >= Nitems / 2

db_ls is the database with these four steps

5) Psychometric synonym correlation: Set cutoffs of .6 (CPIC) and .5 (PDS m/f) to consider items as synonyms, then compute the correlation of subjects across all pairs of items with cor > cutoff. Don’t apply on yrs (too few item pairs, too many subjects removal).

db_psyn is the database with these five steps (most stringent criteria).

* Note that each correction is computed at the test level (that is, in each test by its own), but subjects are removed in the whole database (in all tests). That way, we ensure all subjects have valid measurements in each variable.

* We also analysed the time to complete the questionnaires, and we couldn’t find any evidence of people answering too fast or too slow.

Inverted version (with corresponding items inverted to get sum scores) of each database provided. (just db_inverted)

To conduct factorial and invariance analyses, data was split into two halves. In the "split" folder you can find each curation version split.

The final version used in the paper was db_ls_inverted_clean.csv, also provided.

Code to obtain every version of the database from the raw one, as well as all analyses can be found in the code folder.


Usage metrics



    Ref. manager