This document details the analyses reported in The effects of bidialectalism on executive function (Poarch, Vanhove & Berthele). For this study, Simon and flanker data of 34 participants who also filled out a Bilingual Language Profile for Standard German and the Swabian dialect were analysed.
Note: Participant 108 was removed since he wasn’t bidialectal.
The main points are summarised here:
For the main analyses, we re-expressed the individual response latencies (ms) as response speeds (stimuli/s). For instance, a response latency of 333 ms corresponds to 3 stimuli per second. Re-expressing latencies as speeds results in a more normal-ish distribution of responses, and response speeds are also readily interpretable. For follow-up analyses (Chapter “A multiverse analysis”), we also considered latency-based analyses (both raw latencies and log-transformed latencies).
For the main analyses, we did not exclude incorrect responses (of which there are very few) or responses that are too fast or too slow as no exclusion criteria were formulated beforehand. For follow-up analyses (Chapter “A multiverse analysis”), we also considered excluding incorrect responses, responses faster than a given threshold, or responses too far away from the participant’s or overall mean.
Similarly, we have a choice in how to express the predictor variable (language dominance). See paper on this.
We didn’t consider education level and Raven performance as covariates since these may be post-treatment variables: if bidialectalism affects executive function, it may directly or indirectly affect Raven
and Education
, too.
Please note that the possible predictor variables are correlated (item 5) as are the flanker and Simon effects (Chapter “Correlation between flanker and Simon effects”). As a result, the different analyses are not independent from one another: If two different analyses produce similar results, this doesn’t carry the same weight as when two independent experiments produce a similar result. This is important to keep in mind when interpreting the results.
We also conducted a multiverse analysis. For this, we automatically analysed the data according to all possible sensible combinations of choices we have (how to transform the data, whether to exclude outliers, etc.). You find the results at the bottom of the page. The take-home point here is that slightly different, but sensible choices in the analysis can yield substantially different results (here: p-values).
This code empties the workspace, overrides some R defaults (nothing too important) and loads the packages used in the analysis. To install the packages, run install.packages("packageName")
.
# EMPTYING WORKSPACE!!
rm(list = ls())
# R settings
options(digits = 4, # fewer digits in output
show.signif.stars = FALSE, # no significance stars
na.action = na.fail) # explicit error for missing data
# Load packages
# see bottom of page for versions and sources
library(ggplot2) # for plotting
library(cowplot) # ..
library(dplyr) # for working with dataframes
library(tidyr) # ..
library(magrittr) # for writing more transparent code
library(lme4) # for mixed-effects models
library(mgcv) # for generalised additive models
library(pbkrtest) # for bootstrapping mixed-effects models
# Set default plotting theme
theme_set(theme_cowplot(10))
Software versions:
devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.3.2 (2016-10-31)
system x86_64, linux-gnu
ui RStudio (1.0.143)
language en_US
collate en_US.UTF-8
tz Europe/Brussels
date 2017-07-03
Packages ---------------------------------------------------------------------------------------------------------------------------------
package * version date source
assertthat 0.2.0 2017-04-11 cran (@0.2.0)
base * 3.3.2 2016-11-01 local
colorspace 1.3-2 2016-12-14 cran (@1.3-2)
cowplot * 0.7.0 2016-10-28 CRAN (R 3.3.2)
datasets * 3.3.2 2016-11-01 local
DBI 0.6-1 2017-04-01 cran (@0.6-1)
devtools 1.13.1 2017-05-13 CRAN (R 3.3.2)
digest 0.6.12 2017-01-27 cran (@0.6.12)
dplyr * 0.5.0 2016-06-24 CRAN (R 3.3.1)
ggplot2 * 2.2.1.9000 2017-05-09 Github (hadley/ggplot2@f4398b6)
graphics * 3.3.2 2016-11-01 local
grDevices * 3.3.2 2016-11-01 local
grid 3.3.2 2016-11-01 local
gtable 0.2.0 2016-02-26 CRAN (R 3.3.0)
knitr 1.16 2017-05-18 CRAN (R 3.3.2)
lattice 0.20-35 2017-03-25 CRAN (R 3.3.2)
lazyeval 0.2.0 2016-06-12 CRAN (R 3.3.1)
lme4 * 1.1-14 2017-05-23 Github (lme4/lme4@245b16e)
magrittr * 1.5 2014-11-22 CRAN (R 3.3.1)
MASS 7.3-45 2015-11-10 CRAN (R 3.2.5)
Matrix * 1.2-8 2017-01-20 CRAN (R 3.3.2)
memoise 1.1.0 2017-04-21 CRAN (R 3.3.2)
methods * 3.3.2 2016-11-01 local
mgcv * 1.8-16 2016-11-07 CRAN (R 3.3.2)
minqa 1.2.4 2014-10-09 CRAN (R 3.3.0)
munsell 0.4.3 2016-02-13 CRAN (R 3.3.0)
nlme * 3.1-131 2017-02-06 CRAN (R 3.3.2)
nloptr 1.0.4 2014-08-04 CRAN (R 3.3.0)
parallel 3.3.2 2016-11-01 local
pbkrtest * 0.4-7 2017-03-15 cran (@0.4-7)
plyr 1.8.4 2016-06-08 cran (@1.8.4)
R6 2.2.1 2017-05-10 CRAN (R 3.3.2)
Rcpp 0.12.11 2017-05-22 cran (@0.12.11)
rlang 0.1.1 2017-05-18 cran (@0.1.1)
scales 0.4.1 2016-11-09 cran (@0.4.1)
splines 3.3.2 2016-11-01 local
stats * 3.3.2 2016-11-01 local
tibble 1.3.3 2017-05-28 CRAN (R 3.3.2)
tidyr * 0.6.3 2017-05-15 CRAN (R 3.3.2)
tools 3.3.2 2016-11-01 local
utils * 3.3.2 2016-11-01 local
withr 1.0.2 2016-06-20 cran (@1.0.2)
Read in the Simon and flanker data.
# Read in flanker/Simon data
dat <- read.csv("bidialectalism_ef.csv")
# Recode Subject as factor
dat$Subject <- factor(dat$Subject)
# Remove subject 108 (not bidialectal)
dat <- dat %>%
filter(Subject != "108")
summary(dat)
Subject SessionDate Block Task Trial PresentationFlanker.ACC PresentationFlanker.RT
100 : 384 05-14-2015:1920 Min. :1.0 Flanker:6528 Min. : 9 Min. :0 Min. : 232
101 : 384 06-16-2015:1536 1st Qu.:1.0 Simon :6528 1st Qu.: 60 1st Qu.:1 1st Qu.: 370
102 : 384 07-02-2015:1536 Median :1.5 Median :108 Median :1 Median : 416
103 : 384 05-12-2015:1152 Mean :1.5 Mean :108 Mean :1 Mean : 442
104 : 384 05-13-2015:1152 3rd Qu.:2.0 3rd Qu.:156 3rd Qu.:1 3rd Qu.: 480
105 : 384 06-09-2015:1152 Max. :2.0 Max. :216 Max. :1 Max. :1901
(Other):10752 (Other) :4608 NA's :6528 NA's :6528
PresentationSimon.ACC PresentationSimon.RT Status TrialsFlanker.Cycle TrialsFlanker.Sample TrialsSimon.Sample RT
Min. :0 Min. : 212 congruent :6528 Min. :1 Min. : 1 Min. : 1 Min. : 212
1st Qu.:1 1st Qu.: 335 incongruent:6528 1st Qu.:1 1st Qu.: 49 1st Qu.: 49 1st Qu.: 352
Median :1 Median : 385 Median :1 Median : 96 Median : 96 Median : 401
Mean :1 Mean : 417 Mean :1 Mean : 96 Mean : 96 Mean : 430
3rd Qu.:1 3rd Qu.: 453 3rd Qu.:1 3rd Qu.:144 3rd Qu.:144 3rd Qu.: 468
Max. :1 Max. :5851 Max. :1 Max. :192 Max. :192 Max. :5851
NA's :6528 NA's :6528 NA's :6528 NA's :6528 NA's :6528
Accuracy
Min. :0.000
1st Qu.:1.000
Median :1.000
Mean :0.983
3rd Qu.:1.000
Max. :1.000
Brief explanation of the variables:
Subject
: Subject ID.SessionDate
: When were the data collected? (mm-dd-yyyy)Block
: The Simon and flanker tasks were presented in a random order. If Block == 1
, then the row contains data for the task that was presented first; if Block == 2
, then for the task that was presented second.Trial
: Trial number. This does not start at 1 since training trials were removed from the dataset. Furthermore, some participants needed to redo the training trials so that their first trial is not 9 (as for other participants). If anyone wants to include trial number in the analyses: it is better to use TrialsFlanker.Sample
/TrialsSimon.Sample
instead.PresentationFlanker.ACC
: Accuracy on flanker trials. 0
= incorrect; 1
= correct; NA
= no flanker trial.PresentationFlanker.RT
: Response latency on flanker trials in milliseconds. NA
= no flanker trial.PresentationSimon.ACC
: Accuracy on Simon trials. 0
= incorrect; 1
= correct; NA
= no Simon trial.PresentationSimon.RT
: Response latency on Simon trials in milliseconds. NA
= no flanker trial.Status
: Was the stimulus a congruent
or an incongruent
stimulus?TrialsFlanker.Cycle
: Superfluous.TrialsFlanker.Sample
: Trial number for the flanker stimuli.TrialsSimon.Sample
: Trial number for the Simon stimuli.RT
: Response latency in milliseconds regardless of the task.Accuracy
: Accuracy (0
vs. 1
) regardless of the task.The following code adds the variable TrialNumber
, i.e., trial number regardless of the task, for convenience:
dat$TrialNumber <- ifelse(dat$Task == "Flanker",
dat$TrialsFlanker.Sample,
dat$TrialsSimon.Sample)
There was a self-paced break between the 64th and 65th as well as between the 128th and 129th trial. The following code adds this information:
dat$BlockInTask <- cut(dat$TrialNumber, c(0, 64, 128, 192))
As could be expected, the response latencies were right skewed.
ggplot(dat,
aes(x = RT)) +
geom_histogram(colour = "lightgrey", fill = "red3", binwidth = 100) +
xlab("Response latency in ms") +
ylab("Number of observations") +
facet_wrap(~ Task + Status, scales = "free_x")
Dotcharts (RTs sorted from low to high by task by participants):
ggplot(dat,
aes(x = RT,
y = rank(RT))) +
geom_point(pch = 1, size = 0.3) +
facet_grid(Subject ~ Task, scales = "free") +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank())
Across all tasks and conditions, only 0.54 % of the response latencies were faster than 250 ms, and 5.9 % were faster than 300 ms. The table below shows the percentage of response latencies under 250 ms per task per condition.
dat %>%
group_by(Task, Status) %>%
summarise(PercentUnder250ms = (mean(RT < 250) * 100) %>% signif(2))
In light of the (anticipated) positive skew, it makes sense to transform the response latencies to response speeds (inverse transformation). Instead of answering the question How much milliseconds does it take to respond to a stimulus?, this answers the (equivalent) question How many stimuli can be responded to in one second?. Note that higher values now mean faster responses.
# Inverse transformation * 1000 (milliseconds to seconds)
dat$Speed <- 1000/dat$RT
ggplot(dat,
aes(x = Speed)) +
geom_histogram(colour = "lightgrey", fill = "red3", binwidth = 0.2) +
xlab("Response speed (stimuli/s)") +
ylab("Number of observations") +
facet_wrap(~ Task + Status, scales = "free")
This already looks much better.
We did not decide on a latency/speed-based outlier criterion beforehand, since the idea was to use this dataset to pick such criteria for a larger study. For this reason, we will not exclude any observations at this stage.
ggplot(dat,
aes(x = Speed,
y = rank(Speed))) +
geom_point(pch = 1, size = 0.5) +
facet_grid(Subject ~ Task, scales = "free") +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank())
We also tried another common RT transformation, viz. the logarithmic transformation. This retained some of the right skew, however (plots below), and seems to necessitate the exclusion of long outliers:
ggplot(dat,
aes(x = log10(RT))) +
geom_histogram(colour = "lightgrey", fill = "red3", binwidth = 0.1) +
xlab("log10 response speed (log10 ms)") +
ylab("Number of observations") +
facet_wrap(~ Task + Status, scales = "free")
Overall, 98.3 % of the responses provided were accurate. The table below shows their breakdown by task and condition.
dat %>%
group_by(Task, Status) %>%
summarise(`Percentage of accurate responses` = (mean(Accuracy) * 100) %>% signif(3))
Or per participant:
dat %>%
group_by(Task, Status, Subject) %>%
summarise(`Accuracy` = (mean(Accuracy) * 100)) %>%
group_by(Task, Status) %>%
summarise(`Mean` = mean(Accuracy),
`SD` = sd(Accuracy),
`Minimum` = min(Accuracy),
`Maximum` = max(Accuracy))
For now, we will not exclude incorrect responses from the dataset, but we will later check whether there is any suggestion of a speed–accuracy trade-off.
The following code calculates mean and median response speeds as well as the proportion of correct responses by congruence status and task for each participant. Then these summaries are used to compute Simon and flanker effects as both difference (\(\textrm{speed congruent} - \textrm{speed incongruent}\)) and ratio scores (\(\frac{\textrm{speed congruent}}{\textrm{speed incongruent}}\)).
See source code.
Below we have plotted the mean response speed in the congruent condition per participant versus the difference in mean response speed between the congruent and the incongruent condition per participant for both the flanker and Simon tasks. The smaller the difference (\(y\)-axis), the smaller the flanker/Simon effect, which is suggestive of more cognitive control.
Note that there may be some trend towards greater flanker/Simon effects for participants that responsed faster in the congruent condition (the trend lines go up). A straightforward explanation for this is regression to the mean: fast responders in the congruent condition are also fast responders in the incongruent condition, but the prediction is imperfect. Note, also, that there is considerable uncertainty about this trend (the 95% confidence bands are wide).
The dashed lines mark the zero-difference line. Dots above this line indicate participants with flanker/Simon effects that go in the direction expected. For the flanker task, 33 out of 34 participants showed the expected positive effect; for the Simon task, 28 out of 34 did.
ggplot(perPart,
aes(x = meanSpeedCongruent,
y = meanSpeedDiff)) +
geom_point(pch = 1) +
xlab("Mean response speed (stimuli/s) in congruent condition") +
ylab("Difference in mean response speeds\n(congruent - incongruent)") +
facet_wrap(~ Task, scales = "free") +
geom_smooth(method = "lm") +
geom_hline(yintercept = 0, lty = 2)
Summary table:
perPart %>%
group_by(Task) %>%
summarise(mean(meanSpeedCongruent),
sd(meanSpeedCongruent),
min(meanSpeedCongruent),
max(meanSpeedCongruent),
mean(meanSpeedIncongruent),
sd(meanSpeedIncongruent),
min(meanSpeedIncongruent),
max(meanSpeedIncongruent),
mean(meanSpeedDiff),
sd(meanSpeedDiff),
min(meanSpeedDiff),
max(meanSpeedDiff))
Accuracy is above 90% for all participants in both tasks. Scatterplots of accuracy vs. mean response speed do not suggest a speed–accuracy trade-off.
ggplot(perPart,
aes(x = (AccuracyCongruent + AccuracyIncongruent)/2,
y = (meanSpeedIncongruent + meanSpeedCongruent)/2)) +
geom_point(pch = 1) +
geom_smooth(method = "lm") +
xlab("Accuracy") +
ylab("Mean response speed\n(stimuli/s)") +
facet_wrap(~ Task, scales = "free")
ggplot(perPart,
aes(x = AccuracyCongruent,
y = meanSpeedCongruent)) +
geom_point(pch = 1) +
geom_smooth(method = "lm") +
xlab("Accuracy congruent") +
ylab("Congruent response speed\n(stimuli/s)") +
facet_wrap(~ Task, scales = "free")
ggplot(perPart,
aes(x = AccuracyIncongruent,
y = meanSpeedIncongruent)) +
geom_point(pch = 1) +
geom_smooth(method = "lm") +
xlab("Accuracy incongruent") +
ylab("Incongruent response speed\n(stimuli/s)") +
facet_wrap(~ Task, scales = "free")
The scatterplot below shows how the participants’ Simon and flanker effects are correlated.
ggplot(perPart %>%
select(Subject, Task, meanSpeedDiff) %>%
spread(Task, meanSpeedDiff),
aes(x = Simon, y = Flanker)) +
geom_point(pch = 1) +
geom_smooth(method = "gam", formula = y ~ s(x)) +
xlab("Simon effect\n(mean speed difference congruent-incongruent, stimuli/s)") +
ylab("Flanker effect\n(mean speed difference congruent-incongruent, stimuli/s)")
The correlation between the Simon and flanker effects expressed as mean speed differences is r = 0.46.
Here we calculate the dominance values on the basis of the questionnaire data. Missing/non-stated data were marked as NA
, and we used a ceiling of 20 years for the first two questions ( When did you start learning language X?, From what age onwards did you feel comfortable in language X? ).
# Read in raw questionnaire data
questionnaire <- read.csv("bidialectalism_q.csv")
questionnaire$Subject <- factor(questionnaire$Subject)
# Summary of all variables
summary(questionnaire)
Subject valid Age Sex Education X1_DE_AoA X1_SW_AoA X2_Age_DE X2_Age_SW
100 : 1 Min. :0.000 Min. :16.0 female:15 Min. :1.00 Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.00
101 : 1 1st Qu.:1.000 1st Qu.:19.0 male :20 1st Qu.:2.00 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 6.0 1st Qu.: 3.00
102 : 1 Median :1.000 Median :22.0 Median :3.00 Median : 3.00 Median : 1.00 Median :12.0 Median : 5.00
103 : 1 Mean :0.971 Mean :22.5 Mean :3.09 Mean : 3.43 Mean : 2.52 Mean :12.1 Mean : 4.67
104 : 1 3rd Qu.:1.000 3rd Qu.:26.0 3rd Qu.:4.00 3rd Qu.: 6.00 3rd Qu.: 2.00 3rd Qu.:17.5 3rd Qu.: 6.00
105 : 1 Max. :1.000 Max. :29.0 Max. :5.00 Max. :12.00 Max. :16.00 Max. :24.0 Max. :10.00
(Other):29 NA's :1 NA's :8
X4_Country_DE X4_Country_SW X5_Fam_DE X5_Fam_SW X6_Job_DE X6_Job_SW X7a_DE_SpeakFri X7b_SW_SpeakFri
Min. :16.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. :0.000 Min. :0.000
1st Qu.:19.0 1st Qu.:17.5 1st Qu.: 0.0 1st Qu.:17.0 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:0.025 1st Qu.:0.325
Median :22.0 Median :22.0 Median :16.0 Median :22.0 Median : 1.00 Median : 0.00 Median :0.250 Median :0.700
Mean :22.4 Mean :21.0 Mean :11.5 Mean :18.8 Mean : 2.54 Mean : 2.03 Mean :0.336 Mean :0.616
3rd Qu.:26.0 3rd Qu.:26.0 3rd Qu.:21.5 3rd Qu.:26.0 3rd Qu.: 3.50 3rd Qu.: 2.50 3rd Qu.:0.550 3rd Qu.:0.975
Max. :29.0 Max. :29.0 Max. :28.0 Max. :29.0 Max. :12.00 Max. :12.00 Max. :0.980 Max. :1.000
X8a_DE_SpeakFam X8b_SW_SpeakFam X9a_DE_Job.Uni X9b_SW_Job.Uni X10a_DE_SpeakSelf X10b_SW_SpeakSelf X11a_DE_Count X11b_SW_Count
Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
1st Qu.:0.000 1st Qu.:0.600 1st Qu.:0.175 1st Qu.:0.100 1st Qu.:0.000 1st Qu.:0.335 1st Qu.:0.100 1st Qu.:0.025
Median :0.100 Median :0.900 Median :0.500 Median :0.350 Median :0.300 Median :0.600 Median :0.500 Median :0.400
Mean :0.256 Mean :0.744 Mean :0.460 Mean :0.424 Mean :0.292 Mean :0.604 Mean :0.465 Mean :0.434
3rd Qu.:0.400 3rd Qu.:1.000 3rd Qu.:0.725 3rd Qu.:0.800 3rd Qu.:0.475 3rd Qu.:1.000 3rd Qu.:0.800 3rd Qu.:0.850
Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 Max. :0.980 Max. :1.000 Max. :1.000 Max. :1.000
X12_DE_Speak X12_SW_Speak X13_DE_Under X13_SW_Under Raven X3_Years_DE X3_Years_SW X14_DE_Read X14_SW_Read
Min. :1.00 Min. :0.00 Min. :1.00 Min. :0.00 Min. : 5.0 Min. : 8.0 Min. : 0 Min. :2.00 Min. :0.00
1st Qu.:3.50 1st Qu.:4.00 1st Qu.:6.00 1st Qu.:5.00 1st Qu.:11.0 1st Qu.:11.0 1st Qu.: 0 1st Qu.:6.00 1st Qu.:3.00
Median :5.00 Median :5.00 Median :6.00 Median :6.00 Median :13.0 Median :12.0 Median : 0 Median :6.00 Median :4.00
Mean :4.36 Mean :4.66 Mean :5.77 Mean :5.09 Mean :12.7 Mean :12.5 Mean : 1 Mean :5.69 Mean :3.97
3rd Qu.:5.00 3rd Qu.:6.00 3rd Qu.:6.00 3rd Qu.:6.00 3rd Qu.:15.0 3rd Qu.:13.0 3rd Qu.: 0 3rd Qu.:6.00 3rd Qu.:5.00
Max. :6.00 Max. :6.00 Max. :6.00 Max. :6.00 Max. :17.0 Max. :18.0 Max. :12 Max. :6.00 Max. :6.00
NA's :2
X15_DE_Write X15_SW_Write
Min. :2.00 Min. :0.0
1st Qu.:4.50 1st Qu.:2.5
Median :5.00 Median :4.0
Mean :5.13 Mean :3.6
3rd Qu.:6.00 3rd Qu.:5.0
Max. :6.00 Max. :6.0
Note that there are a couple of missing values:
X1_SW_AoA
. Age of acquisition of Swabian: One participant never learnt Swabian and doesn’t qualify as a bidialectal. Remove his data.questionnaire <- questionnaire %>%
filter(!is.na(X1_SW_AoA))
X2_Age_SW
. Question: Ab welchem Alter haben Sie sich in den folgenden Sprachen sicher gefühlt?One option would be to impute missing values with the group mean, but that wouldn’t seem to make much sense as it would mean that the participants felt comfortable in Swabian before having acquired it:
questionnaire %>% select(X1_SW_AoA, X2_Age_SW) %>% filter(is.na(X2_Age_SW))
Using the maximum possible response (20) makes more sense to me.
questionnaire$X2_Age_SW2 <- questionnaire$X2_Age_SW
# replace by maximum (20)
questionnaire$X2_Age_SW2[is.na(questionnaire$X2_Age_SW2)] <- 20
X3_Years_SW
. Question: Wie viele Jahre hatten Sie Unterricht in den folgenden Sprachen? (Grundschule, weiterführende Schule, Hochschule)?. In all likelihood, this would be 0 for Swabian.questionnaire$X3_Years_SW2 <- questionnaire$X3_Years_SW
questionnaire$X3_Years_SW2[is.na(questionnaire$X3_Years_SW2)] <- 0
Note, however, that 3 participants provided non-zero responses, suggesting that they may have interpreted the question differently from other participants: schooling in Swabian just doesn’t exist.
Here we replace non-zero values with 0 for this question.
questionnaire$X3_Years_SW2[questionnaire$X3_Years_SW2 > 0] <- 0
X4_Country_DE
, X4_Country_SW
, X5_Fam_DE
, X5_Fam_SW
should be 20.questionnaire$X4_Country_DE[questionnaire$X4_Country_DE > 20] <- 20
questionnaire$X4_Country_SW[questionnaire$X4_Country_SW > 20] <- 20
questionnaire$X5_Fam_DE[questionnaire$X5_Fam_DE > 20] <- 20
questionnaire$X5_Fam_SW[questionnaire$X5_Fam_SW > 20] <- 20
summary(questionnaire)
Subject valid Age Sex Education X1_DE_AoA X1_SW_AoA X2_Age_DE X2_Age_SW
100 : 1 Min. :1 Min. :16.0 female:15 Min. :1.00 Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0.00
101 : 1 1st Qu.:1 1st Qu.:19.0 male :19 1st Qu.:2.00 1st Qu.: 0.25 1st Qu.: 0.00 1st Qu.: 6.25 1st Qu.: 3.00
102 : 1 Median :1 Median :22.0 Median :3.00 Median : 3.00 Median : 1.00 Median :12.00 Median : 5.00
103 : 1 Mean :1 Mean :22.6 Mean :3.09 Mean : 3.53 Mean : 2.52 Mean :12.27 Mean : 4.67
104 : 1 3rd Qu.:1 3rd Qu.:26.0 3rd Qu.:4.00 3rd Qu.: 6.00 3rd Qu.: 2.00 3rd Qu.:17.75 3rd Qu.: 6.00
105 : 1 Max. :1 Max. :29.0 Max. :5.00 Max. :12.00 Max. :16.00 Max. :24.00 Max. :10.00
(Other):28 NA's :7
X4_Country_DE X4_Country_SW X5_Fam_DE X5_Fam_SW X6_Job_DE X6_Job_SW X7a_DE_SpeakFri X7b_SW_SpeakFri
Min. :16.0 Min. : 0.0 Min. : 0.0 Min. : 0.0 Min. : 0.00 Min. : 0.00 Min. :0.0000 Min. :0.000
1st Qu.:19.0 1st Qu.:18.0 1st Qu.: 0.0 1st Qu.:17.0 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:0.0125 1st Qu.:0.362
Median :20.0 Median :20.0 Median :16.0 Median :20.0 Median : 1.50 Median : 0.00 Median :0.2250 Median :0.700
Mean :19.2 Mean :18.2 Mean :10.1 Mean :16.1 Mean : 2.62 Mean : 2.09 Mean :0.3174 Mean :0.634
3rd Qu.:20.0 3rd Qu.:20.0 3rd Qu.:19.8 3rd Qu.:20.0 3rd Qu.: 3.75 3rd Qu.: 2.75 3rd Qu.:0.5000 3rd Qu.:0.988
Max. :20.0 Max. :20.0 Max. :20.0 Max. :20.0 Max. :12.00 Max. :12.00 Max. :0.9000 Max. :1.000
X8a_DE_SpeakFam X8b_SW_SpeakFam X9a_DE_Job.Uni X9b_SW_Job.Uni X10a_DE_SpeakSelf X10b_SW_SpeakSelf X11a_DE_Count X11b_SW_Count
Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.0000
1st Qu.:0.000 1st Qu.:0.625 1st Qu.:0.163 1st Qu.:0.100 1st Qu.:0.000 1st Qu.:0.378 1st Qu.:0.100 1st Qu.:0.0625
Median :0.075 Median :0.925 Median :0.450 Median :0.375 Median :0.300 Median :0.600 Median :0.500 Median :0.4000
Mean :0.234 Mean :0.766 Mean :0.446 Mean :0.437 Mean :0.272 Mean :0.622 Mean :0.464 Mean :0.4465
3rd Qu.:0.375 3rd Qu.:1.000 3rd Qu.:0.700 3rd Qu.:0.800 3rd Qu.:0.438 3rd Qu.:1.000 3rd Qu.:0.800 3rd Qu.:0.8750
Max. :1.000 Max. :1.000 Max. :1.000 Max. :1.000 Max. :0.900 Max. :1.000 Max. :1.000 Max. :1.0000
X12_DE_Speak X12_SW_Speak X13_DE_Under X13_SW_Under Raven X3_Years_DE X3_Years_SW X14_DE_Read X14_SW_Read
Min. :1.00 Min. :1.00 Min. :1.00 Min. :2.00 Min. : 5.0 Min. : 8.0 Min. : 0.00 Min. :2.00 Min. :0.00
1st Qu.:3.25 1st Qu.:4.25 1st Qu.:6.00 1st Qu.:5.00 1st Qu.:11.0 1st Qu.:11.0 1st Qu.: 0.00 1st Qu.:6.00 1st Qu.:3.25
Median :5.00 Median :5.00 Median :6.00 Median :6.00 Median :12.5 Median :12.0 Median : 0.00 Median :6.00 Median :4.00
Mean :4.32 Mean :4.79 Mean :5.76 Mean :5.24 Mean :12.6 Mean :12.5 Mean : 1.03 Mean :5.68 Mean :4.00
3rd Qu.:5.00 3rd Qu.:6.00 3rd Qu.:6.00 3rd Qu.:6.00 3rd Qu.:15.0 3rd Qu.:13.0 3rd Qu.: 0.00 3rd Qu.:6.00 3rd Qu.:5.00
Max. :6.00 Max. :6.00 Max. :6.00 Max. :6.00 Max. :17.0 Max. :18.0 Max. :12.00 Max. :6.00 Max. :6.00
NA's :2
X15_DE_Write X15_SW_Write X2_Age_SW2 X3_Years_SW2
Min. :2.00 Min. :0.00 Min. : 0.00 Min. :0
1st Qu.:4.25 1st Qu.:2.25 1st Qu.: 4.00 1st Qu.:0
Median :5.00 Median :4.00 Median : 5.00 Median :0
Mean :5.10 Mean :3.59 Mean : 7.82 Mean :0
3rd Qu.:6.00 3rd Qu.:5.00 3rd Qu.: 9.50 3rd Qu.:0
Max. :6.00 Max. :6.00 Max. :20.00 Max. :0
Now compute the scores for the different dimensions using the protocol proposed on https://sites.la.utexas.edu/bilingual/scoring-and-interpreting-the-results/. The ProficiencyGermanOral
and ProficiencySwabianOral
variables don’t take into consideration the participants’ reading and writing skills. For the corresponding TotalGermanOral
and TotalSwabianOral
variables, the proficiency component was weighted doubly to compensate for this.
questionnaire <- questionnaire %>%
mutate(HistoryGerman = (20 - X1_DE_AoA) + (20 - X2_Age_DE) + X3_Years_DE + X4_Country_DE + X5_Fam_DE + X6_Job_DE) %>%
mutate(HistorySwabian = (20 - X1_SW_AoA) + (20 - X2_Age_SW2) + X3_Years_SW2 + X4_Country_SW + X5_Fam_SW + X6_Job_SW) %>%
mutate(UseGerman = 10 * (X7a_DE_SpeakFri + X8a_DE_SpeakFam + X9a_DE_Job.Uni + X10a_DE_SpeakSelf + X11a_DE_Count)) %>%
mutate(UseSwabian = 10 * (X7b_SW_SpeakFri + X8b_SW_SpeakFam + X9b_SW_Job.Uni + X10b_SW_SpeakSelf + X11b_SW_Count)) %>%
mutate(ProficiencyGerman = X12_DE_Speak + X13_DE_Under + X14_DE_Read + X15_DE_Write) %>%
mutate(ProficiencySwabian = X12_SW_Speak + X13_SW_Under + X14_SW_Read + X15_SW_Write) %>%
mutate(ProficiencyGermanOral = X12_DE_Speak + X13_DE_Under) %>%
mutate(ProficiencySwabianOral = X12_SW_Speak + X13_SW_Under) %>%
mutate(TotalGerman = 0.454*HistoryGerman + 1.09*UseGerman + 2.27*ProficiencyGerman) %>%
mutate(TotalSwabian = 0.454*HistorySwabian + 1.09*UseSwabian + 2.27*ProficiencySwabian) %>%
mutate(TotalGermanOral = 0.454*HistoryGerman + 1.09*UseGerman + 4.54*ProficiencyGermanOral) %>%
mutate(TotalSwabianOral = 0.454*HistorySwabian + 1.09*UseSwabian + 4.54*ProficiencySwabian) %>%
mutate(DominanceSwabian = TotalSwabian - TotalGerman) %>%
mutate(DominanceSwabianOral = TotalSwabianOral - TotalGermanOral) %>%
mutate(DominanceSwabianUse = UseSwabian - UseGerman)
Dominance of Swabian, i.e., the participants overall Swabian score minus their overall German score:
ggplot(questionnaire,
aes(x = DominanceSwabian)) +
geom_histogram(colour = "lightgrey", fill = "red3", binwidth = 25) +
scale_y_continuous(breaks = c(0, 3, 6, 9)) +
xlab("Dominance of Swabian") +
ylab("Number of participants")
Dominance of Swabian, taking only into account oral – not written – proficiency:
ggplot(questionnaire,
aes(x = DominanceSwabianOral)) +
geom_histogram(colour = "lightgrey", fill = "red3", binwidth = 25) +
scale_y_continuous(breaks = c(0, 3, 6, 9)) +
xlab("Dominance of Swabian (oral)") +
ylab("Number of participants")
Dominance of Swabian, considering only the use dimension, not the other dimensions:
ggplot(questionnaire,
aes(x = DominanceSwabianUse)) +
geom_histogram(colour = "lightgrey", fill = "red3", bins = 12) +
scale_y_continuous(breaks = c(0, 3, 6, 9)) +
xlab("Dominance of Swabian (use)") +
ylab("Number of participants")
Total Swabian score, i.e., not taking into account the participants’ German scores:
ggplot(questionnaire,
aes(x = TotalSwabian)) +
geom_histogram(colour = "lightgrey", fill = "red3", bins = 10) +
scale_y_continuous(breaks = c(0, 3, 6, 9)) +
xlab("Total Swabian score") +
ylab("Number of participants")
Total Swabian score, oral skills only:
ggplot(questionnaire,
aes(x = TotalSwabianOral)) +
geom_histogram(colour = "lightgrey", fill = "red3", bins = 10) +
scale_y_continuous(breaks = c(0, 3, 6, 9)) +
xlab("Total Swabian score (oral only)") +
ylab("Number of participants")
ggplot(questionnaire,
aes(x = Raven)) +
geom_histogram(colour = "lightgrey", fill = "red3", binwidth = 2) +
scale_x_continuous(breaks = c(5, 9, 13, 17)) +
xlab("Raven score (out of 18") +
ylab("Number of participants")
# Age
questionnaire %>%
summarise(mean = mean(Age),
sd = sd(Age),
min = min(Age),
max = max(Age))
# Raven
questionnaire %>%
summarise(mean = mean(Raven),
sd = sd(Raven),
min = min(Raven),
max = max(Raven))
# AoA St. German
questionnaire %>%
summarise(mean = mean(X1_DE_AoA),
sd = sd(X1_DE_AoA),
min = min(X1_DE_AoA),
max = max(X1_DE_AoA))
# AoA Swabian
questionnaire %>%
summarise(mean = mean(X1_SW_AoA),
sd = sd(X1_SW_AoA),
min = min(X1_SW_AoA),
max = max(X1_SW_AoA))
# Proficiency St. German
questionnaire %>%
mutate(StandardGerman = (X12_DE_Speak + X13_DE_Under + X14_DE_Read + X15_DE_Write)/4) %>%
summarise(mean = mean(StandardGerman),
sd = sd(StandardGerman),
min = min(StandardGerman),
max = max(StandardGerman))
# Proficiency Schwäbisch
questionnaire %>%
mutate(Swabian = (X12_SW_Speak + X13_SW_Under)/2) %>%
summarise(mean = mean(Swabian),
sd = sd(Swabian),
min = min(Swabian),
max = max(Swabian))
# Use St. German
questionnaire %>%
mutate(StandardGerman = (X7a_DE_SpeakFri + X8a_DE_SpeakFam + X9a_DE_Job.Uni)/3) %>%
summarise(mean = mean(StandardGerman),
sd = sd(StandardGerman),
min = min(StandardGerman),
max = max(StandardGerman))
# Use Swabian
questionnaire %>%
mutate(Swabian = (X7b_SW_SpeakFri + X8b_SW_SpeakFam + X9b_SW_Job.Uni)/3) %>%
summarise(mean = mean(Swabian),
sd = sd(Swabian),
min = min(Swabian),
max = max(Swabian))
The expectation at the outset of the study was that the Simon/flanker effects should be lowest for balanced bidialects, i.e., participants whose dominance values are closest to zero.
# Combine datasets
combined <- perPart %>%
select(Subject, Task, meanSpeedDiff) %>%
full_join(., questionnaire, by = "Subject")
# Plot with DominanceSwabian
p1 <- ggplot(combined,
aes(x = DominanceSwabian,
y = meanSpeedDiff)) +
geom_point(pch = 1) +
geom_smooth(method = "gam", formula = y ~ s(x, bs = "tp"),
fill = "grey70",
colour = "black") +
facet_wrap(~ Task, scales = "free_y") +
geom_vline(xintercept = 0, lty = 2, colour = "black") +
geom_hline(yintercept = 0, lty = 2, colour = "black") +
xlab("Dominance of Swabian") +
ylab("Difference in mean speed\n(stimuli/s)")
print(p1)
# Flanker
FlankerPerPart <- perPart %>%
full_join(., questionnaire, by = "Subject") %>%
filter(Task == "Flanker")
# GAM: allow DominanceSwabian to have a nonlinear effect
# This corresponds to the scatterplot smoother above
flanker.gam <- gam(meanSpeedDiff ~ s(DominanceSwabian, bs = "tp"),
data = FlankerPerPart)
summary(flanker.gam)
Family: gaussian
Link function: identity
Formula:
meanSpeedDiff ~ s(DominanceSwabian, bs = "tp")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2292 0.0181 12.7 5.2e-14
Approximate significance of smooth terms:
edf Ref.df F p-value
s(DominanceSwabian) 1.04 1.09 4.7 0.032
R-sq.(adj) = 0.116 Deviance explained = 14.4%
GCV = 0.011832 Scale est. = 0.011121 n = 34
# edf = 1.04 means that the effect of DominanceSwabian is linear,
# as the scatterplot shows. This is not what was predicted.
# Refit in linear model:
flanker.lm <- lm(meanSpeedDiff ~ DominanceSwabian, data = FlankerPerPart)
summary(flanker.lm)
Call:
lm(formula = meanSpeedDiff ~ DominanceSwabian, data = FlankerPerPart)
Residuals:
Min 1Q Median 3Q Max
-0.22226 -0.06279 0.00021 0.05844 0.28014
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.232797 0.018164 12.8 3.8e-14
DominanceSwabian -0.000836 0.000363 -2.3 0.028
Residual standard error: 0.106 on 32 degrees of freedom
Multiple R-squared: 0.142, Adjusted R-squared: 0.115
F-statistic: 5.29 on 1 and 32 DF, p-value: 0.0281
# Visual model check:
par(mfrow = c(2, 2))
plot(flanker.lm)
par(mfrow = c(1, 1))
# All of this looks fine.
Conclusion: no evidence for a non-linear trend. Linear trend: \(\hat{\beta} = -0.0008 \pm 0.0004\), \(t(32) = 2.3\), \(p = 0.03\).
# Simon
SimonPerPart <- perPart %>%
full_join(., questionnaire, by = "Subject") %>%
filter(Task == "Simon")
# GAM: allow DominanceSwabian to have a nonlinear effect
# This corresponds to the scatterplot smoother above
simon.gam <- gam(meanSpeedDiff ~ s(DominanceSwabian, bs = "tp"),
data = SimonPerPart)
summary(simon.gam)
Family: gaussian
Link function: identity
Formula:
meanSpeedDiff ~ s(DominanceSwabian, bs = "tp")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1334 0.0193 6.92 7.8e-08
Approximate significance of smooth terms:
edf Ref.df F p-value
s(DominanceSwabian) 1 1 5.74 0.022
R-sq.(adj) = 0.126 Deviance explained = 15.2%
GCV = 0.013432 Scale est. = 0.012642 n = 34
# edf = 1 suggests that the effect of Swabian dominance
# is linear.
simon.lm <- lm(meanSpeedDiff ~ DominanceSwabian,
data = SimonPerPart)
summary(simon.lm)
Call:
lm(formula = meanSpeedDiff ~ DominanceSwabian, data = SimonPerPart)
Residuals:
Min 1Q Median 3Q Max
-0.20956 -0.06982 0.00704 0.07276 0.31447
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.137374 0.019353 7.1 4.7e-08
DominanceSwabian -0.000928 0.000387 -2.4 0.023
Residual standard error: 0.112 on 32 degrees of freedom
Multiple R-squared: 0.152, Adjusted R-squared: 0.126
F-statistic: 5.74 on 1 and 32 DF, p-value: 0.0226
Conclusion: no evidence for a non-linear trend. Linear trend: \(\hat{\beta} = -0.0009 \pm 0.0004\), \(t(32) = 2.4\), \(p = 0.02\).
Another way of putting this is to say that the participants with the lowest absolute dominance values should have the lowest Flanker/Simon effects.
# Plot with DominanceSwabian
p2 <- ggplot(combined,
aes(x = abs(DominanceSwabian),
y = meanSpeedDiff)) +
geom_point(pch = 1) +
geom_smooth(method = "gam", formula = y ~ s(x, bs = "tp"),
fill = "grey70",
colour = "black") +
facet_wrap(~ Task, scales = "free_y") +
geom_vline(xintercept = 0, lty = 2, colour = "black") +
geom_hline(yintercept = 0, lty = 2, colour = "black") +
xlab("Absolute dominance") +
ylab("Difference in mean speed\n(stimuli/s)")
print(p2)
Analysis1 <- plot_grid(p1, p2,
ncol = 1, align = "h")
save_plot("Analysis1_RawData.svg", Analysis1, base_height = 4, base_width = 6)
# GAM: allow abs(DominanceSwabian) to have a nonlinear effect
# This corresponds to the scatterplot smoother above
flanker.gam <- gam(meanSpeedDiff ~ s(abs(DominanceSwabian), bs = "tp"),
data = FlankerPerPart)
summary(flanker.gam)
Family: gaussian
Link function: identity
Formula:
meanSpeedDiff ~ s(abs(DominanceSwabian), bs = "tp")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2292 0.0195 11.7 3.9e-13
Approximate significance of smooth terms:
edf Ref.df F p-value
s(abs(DominanceSwabian)) 1 1 0.03 0.85
R-sq.(adj) = -0.0302 Deviance explained = 0.106%
GCV = 0.013775 Scale est. = 0.012965 n = 34
# edf = 1 means that the effect of DominanceSwabian is linear,
# as the scatterplot shows.
# Refit in linear model:
flanker.lm <- lm(meanSpeedDiff ~ abs(DominanceSwabian), data = FlankerPerPart)
summary(flanker.lm)
Call:
lm(formula = meanSpeedDiff ~ abs(DominanceSwabian), data = FlankerPerPart)
Residuals:
Min 1Q Median 3Q Max
-0.24621 -0.08791 -0.00312 0.05304 0.26211
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.234583 0.034945 6.71 1.4e-07
abs(DominanceSwabian) -0.000129 0.000699 -0.18 0.85
Residual standard error: 0.114 on 32 degrees of freedom
Multiple R-squared: 0.00106, Adjusted R-squared: -0.0302
F-statistic: 0.0341 on 1 and 32 DF, p-value: 0.855
# Visual model check:
par(mfrow = c(2, 2))
plot(flanker.lm)
par(mfrow = c(1, 1))
# All of this looks fine.
Conclusion: no evidence for a linear trend. Linear trend: \(\hat{\beta} = -0.0001 \pm 0.0007\), \(t(32) = 0.2\), \(p = 0.85\).
# GAM: allow abs(DominanceSwabian) to have a nonlinear effect
# This corresponds to the scatterplot smoother above
simon.gam <- gam(meanSpeedDiff ~ s(abs(DominanceSwabian), bs = "tp"),
data = SimonPerPart)
summary(simon.gam)
Family: gaussian
Link function: identity
Formula:
meanSpeedDiff ~ s(abs(DominanceSwabian), bs = "tp")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1334 0.0207 6.43 3.1e-07
Approximate significance of smooth terms:
edf Ref.df F p-value
s(abs(DominanceSwabian)) 1 1 0.63 0.43
R-sq.(adj) = -0.0114 Deviance explained = 1.92%
GCV = 0.015537 Scale est. = 0.014623 n = 34
# edf = 1 means that the effect of DominanceSwabian is linear,
# as the scatterplot shows.
# Refit in linear model:
simon.lm <- lm(meanSpeedDiff ~ abs(DominanceSwabian), data = SimonPerPart)
summary(simon.lm)
Call:
lm(formula = meanSpeedDiff ~ abs(DominanceSwabian), data = SimonPerPart)
Residuals:
Min 1Q Median 3Q Max
-0.22380 -0.07440 -0.00828 0.07814 0.24256
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.109038 0.037112 2.94 0.0061
abs(DominanceSwabian) 0.000588 0.000743 0.79 0.4340
Residual standard error: 0.121 on 32 degrees of freedom
Multiple R-squared: 0.0192, Adjusted R-squared: -0.0114
F-statistic: 0.628 on 1 and 32 DF, p-value: 0.434
# Visual model check:
par(mfrow = c(2, 2))
plot(simon.lm)
par(mfrow = c(1, 1))
# All of this looks fine.
Conclusion: no evidence for a linear trend. Linear trend: \(\hat{\beta} = 0.0006 \pm 0.0007\), \(t(32) = 0.8\), \(p = 0.43\).
There is no indication of any effect consistent with our a priori hypothesis here.
# Plot with TotalSwabian
p3 <- ggplot(combined,
aes(x = TotalSwabian,
y = meanSpeedDiff)) +
geom_point(pch = 1) +
geom_smooth(method = "gam", formula = y ~ s(x, bs = "tp"),
fill = "grey70",
colour = "black") +
facet_wrap(~ Task, scales = "free_y") +
geom_vline(xintercept = 0, lty = 2, colour = "black") +
geom_hline(yintercept = 0, lty = 2, colour = "black") +
xlab("Total Swabian score") +
ylab("Difference in mean speed\n(stimuli/s)")
print(p3)
save_plot("Analysis2_RawData.svg", p3, base_height = 2, base_width = 6)
# GAM: allow TotalSwabian to have a nonlinear effect
# This corresponds to the scatterplot smoother above
flanker.gam <- gam(meanSpeedDiff ~ s(TotalSwabian, bs = "tp"),
data = FlankerPerPart)
summary(flanker.gam)
Family: gaussian
Link function: identity
Formula:
meanSpeedDiff ~ s(TotalSwabian, bs = "tp")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.2292 0.0187 12.2 1.3e-13
Approximate significance of smooth terms:
edf Ref.df F p-value
s(TotalSwabian) 1 1 2.89 0.099
R-sq.(adj) = 0.0541 Deviance explained = 8.28%
GCV = 0.012648 Scale est. = 0.011904 n = 34
# edf = 1 means that the effect of DominanceSwabian is linear,
# as the scatterplot shows.
# Refit in linear model:
flanker.lm <- lm(meanSpeedDiff ~ TotalSwabian, data = FlankerPerPart)
summary(flanker.lm)
Call:
lm(formula = meanSpeedDiff ~ TotalSwabian, data = FlankerPerPart)
Residuals:
Min 1Q Median 3Q Max
-0.23424 -0.08612 0.00749 0.05810 0.28041
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.326822 0.060387 5.41 6e-06
TotalSwabian -0.000960 0.000565 -1.70 0.099
Residual standard error: 0.109 on 32 degrees of freedom
Multiple R-squared: 0.0828, Adjusted R-squared: 0.0541
F-statistic: 2.89 on 1 and 32 DF, p-value: 0.0989
# Visual model check:
par(mfrow = c(2, 2))
plot(flanker.lm)
par(mfrow = c(1, 1))
# All of this looks fine.
Linear trend: \(\hat{\beta} = -0.0010 \pm 0.0006\), \(t(32) = 1.7\), \(p = 0.10\).
# GAM: allow TotalSwabian to have a nonlinear effect
# This corresponds to the scatterplot smoother above
simon.gam <- gam(meanSpeedDiff ~ s(TotalSwabian, bs = "tp"),
data = SimonPerPart)
summary(simon.gam)
Family: gaussian
Link function: identity
Formula:
meanSpeedDiff ~ s(TotalSwabian, bs = "tp")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1334 0.0195 6.85 9.5e-08
Approximate significance of smooth terms:
edf Ref.df F p-value
s(TotalSwabian) 1 1 4.98 0.033
R-sq.(adj) = 0.108 Deviance explained = 13.5%
GCV = 0.013709 Scale est. = 0.012903 n = 34
# edf = 1 means that the effect of DominanceSwabian is linear,
# as the scatterplot shows.
# Refit in linear model:
simon.lm <- lm(meanSpeedDiff ~ TotalSwabian, data = SimonPerPart)
summary(simon.lm)
Call:
lm(formula = meanSpeedDiff ~ TotalSwabian, data = SimonPerPart)
Residuals:
Min 1Q Median 3Q Max
-0.2125 -0.0715 0.0120 0.0699 0.3077
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.266781 0.062868 4.24 0.00018
TotalSwabian -0.001312 0.000588 -2.23 0.03281
Residual standard error: 0.114 on 32 degrees of freedom
Multiple R-squared: 0.135, Adjusted R-squared: 0.108
F-statistic: 4.98 on 1 and 32 DF, p-value: 0.0328
# Visual model check:
par(mfrow = c(2, 2))
plot(simon.lm)
par(mfrow = c(1, 1))
# All of this looks fine.
Linear trend: \(\hat{\beta} = -0.0013 \pm 0.0006\), \(t(32) = 2.2\), \(p = 0.03\).
The models reported above rely on aggregated data. For the sake of completeness, we will also analyse non-aggregated data in mixed-effects models.
# Combine questionnaire and non-aggregated data
dat <- merge(dat, questionnaire, by = "Subject")
# Sum-code 'Status' (congruent vs. incongruent) (0.5 vs. -0.5)
dat$n.Status <- (as.numeric(dat$Status) - 1.5) * -1
xtabs(~ n.Status + Status, dat)
Status
n.Status congruent incongruent
-0.5 0 6528
0.5 6528 0
# Centre DominanceSwabian
dat$c.DominanceSwabian <- dat$DominanceSwabian - mean(dat$DominanceSwabian)
# Compute and centre absolute DominanceSwabian
dat$a.DominanceSwabian <- abs(dat$DominanceSwabian)
dat$ca.DominanceSwabian <- dat$a.DominanceSwabian - mean(dat$a.DominanceSwabian)
# Centre TotalSwabian
dat$c.TotalSwabian <- dat$TotalSwabian - mean(dat$TotalSwabian)
# Split up by task
datFlanker <- dat %>% filter(Task == "Flanker")
datSimon <- dat %>% filter(Task == "Simon")
The minimum requirement is that DominanceSwabian
interacts with n.Status
, that is to say, that the effect of congruent vs. incongruent trials varies depending on the participants’ bidialectal dominance. A significant interaction in itself would not be in line with our predictions, since these specify that the flanker/Simon effects should be lowest around DominanceSwabian = 0
; a significant interaction that suggests, say, that the flanker/Simon effects grower larger with increasing Swabian dominance would be incompatible with the predictions.
# Fit model in GAMM
flanker.gam <- gam(Speed ~ n.Status +
s(DominanceSwabian, by = factor(n.Status)) +
s(Subject, bs = "re") + # random intercept for subject
s(n.Status, Subject, bs = "re"), # random slope of n.Congruent by subject.
# This is equivalent to saying that the Flanker effect
# varies between participants
data = datFlanker)
plot(flanker.gam)
The DominanceSwabian
effects are linear, so we’ll model them linearly using lmer()
.
flanker.mer <- lmer(Speed ~ n.Status + c.DominanceSwabian +
c.DominanceSwabian:n.Status +
(1 + n.Status | Subject),
data = datFlanker)
AIC(flanker.mer)
[1] 5988
summary(flanker.mer)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + c.DominanceSwabian + c.DominanceSwabian:n.Status + (1 + n.Status | Subject)
Data: datFlanker
REML criterion at convergence: 5972
Scaled residuals:
Min 1Q Median 3Q Max
-4.563 -0.573 0.036 0.623 4.988
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 0.0779 0.2791
n.Status 0.0082 0.0906 -0.06
Residual 0.1410 0.3755
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.379575 0.048091 49.5
n.Status 0.229235 0.018098 12.7
c.DominanceSwabian -0.000882 0.000966 -0.9
n.Status:c.DominanceSwabian -0.000836 0.000363 -2.3
Correlation of Fixed Effects:
(Intr) n.Stts c.DmnS
n.Status -0.048
c.DmnncSwbn 0.000 0.000
n.Stts:c.DS 0.000 0.000 -0.048
# Without random correlation?
flanker.mer2 <- lmer(Speed ~ n.Status + c.DominanceSwabian +
c.DominanceSwabian:n.Status +
(1 + n.Status || Subject),
data = datFlanker)
AIC(flanker.mer2)
[1] 5987
summary(flanker.mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + c.DominanceSwabian + c.DominanceSwabian:n.Status + ((1 | Subject) + (0 + n.Status | Subject))
Data: datFlanker
REML criterion at convergence: 5972
Scaled residuals:
Min 1Q Median 3Q Max
-4.564 -0.574 0.036 0.623 4.989
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.0779 0.2791
Subject.1 n.Status 0.0082 0.0906
Residual 0.1410 0.3755
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.379575 0.048091 49.5
n.Status 0.229235 0.018098 12.7
c.DominanceSwabian -0.000882 0.000966 -0.9
n.Status:c.DominanceSwabian -0.000836 0.000363 -2.3
Correlation of Fixed Effects:
(Intr) n.Stts c.DmnS
n.Status 0.000
c.DmnncSwbn 0.000 0.000
n.Stts:c.DS 0.000 0.000 0.000
flanker.mer2
, which pegs the correlation between the random intercepts and slopes at 0, seems to be a slightly better model, though it doesn’t matter much.
The source code for this document explores whether this model is correctly specified (distribution of random effects and residuals). In conclusion, all seems to be in order.
Here we test the significance of the interaction between n.Status
and c.DominanceSwabian
.
# 1. Normal approximation
summary(flanker.mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + c.DominanceSwabian + c.DominanceSwabian:n.Status + ((1 | Subject) + (0 + n.Status | Subject))
Data: datFlanker
REML criterion at convergence: 5972
Scaled residuals:
Min 1Q Median 3Q Max
-4.564 -0.574 0.036 0.623 4.989
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.0779 0.2791
Subject.1 n.Status 0.0082 0.0906
Residual 0.1410 0.3755
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.379575 0.048091 49.5
n.Status 0.229235 0.018098 12.7
c.DominanceSwabian -0.000882 0.000966 -0.9
n.Status:c.DominanceSwabian -0.000836 0.000363 -2.3
Correlation of Fixed Effects:
(Intr) n.Stts c.DmnS
n.Status 0.000
c.DmnncSwbn 0.000 0.000
n.Stts:c.DS 0.000 0.000 0.000
# |t| = 2.3 corresponds to roughly p = 0.02, if we assume
# the t-distribution is well approximated by a normal distribution.
# 2. Likelihood ratio test
# Fit null model
flanker.mer20 <- lmer(Speed ~ n.Status + c.DominanceSwabian +
(1 + n.Status || Subject),
data = datFlanker)
anova(flanker.mer20, flanker.mer2)
refitting model(s) with ML (instead of REML)
Data: datFlanker
Models:
flanker.mer20: Speed ~ n.Status + c.DominanceSwabian + ((1 | Subject) + (0 +
flanker.mer20: n.Status | Subject))
flanker.mer2: Speed ~ n.Status + c.DominanceSwabian + c.DominanceSwabian:n.Status +
flanker.mer2: ((1 | Subject) + (0 + n.Status | Subject))
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
flanker.mer20 6 5953 5994 -2971 5941
flanker.mer2 7 5950 5997 -2968 5936 5.2 1 0.023
# LRT: X²(1) = 5.2, p = 0.023
# 3. Last--and better--solution: Parametric bootstrapping:
#PBmodcomp(flanker.mer2, flanker.mer20, nsim = 1000)
# p =~ 0.04; the p-value will vary from run to run
As pointed out above, the mere presence of a significant interaction doesn’t mean the data are consistent with our hypothesis. For this reason, we’ll visualise the modelled flanker effect (speed for congruent trials - speed for incongruent trials) according to the participants’ Swabian dominance values.
(Since the GAMM above suggested the interaction to be linear, it is impossible that the significant interaction above is indeed consistent with our prediction, but this will be more obvious once plotted.)
# Data frame to fill with predictions
newdat <- expand.grid(n.Status = c(-0.5, 0.5),
c.DominanceSwabian = seq(min(datFlanker$c.DominanceSwabian),
max(datFlanker$c.DominanceSwabian)))
# Fill with predictions, ignore random effects
newdat$Prediction <- predict(flanker.mer2, newdat, re.form = NA)
# Add original variables
newdat$Status <- ifelse(newdat$n.Status == -0.5, "incongruent", "congruent")
newdat$DominanceSwabian <- newdat$c.DominanceSwabian + mean(datFlanker$DominanceSwabian)
# Plot effect of DominanceSwabian
ggplot(newdat,
aes(x = DominanceSwabian,
y = Prediction,
linetype = Status)) +
geom_line() +
xlab("Swabian dominance") +
ylab("Predicted response speed\n(stimuli/s)")
# Compute difference between congruent and incongruent for each Dominance value
newdatDiffs <- newdat %>%
select(Status, Prediction, DominanceSwabian) %>%
spread(., Status, Prediction) %>%
mutate(Difference = congruent - incongruent)
# Difference in response speeds:
ggplot(newdatDiffs,
aes(x = DominanceSwabian,
y = Difference)) +
geom_line() +
xlab("Swabian dominance") +
ylab("Flanker effect\n(response speed difference in stimuli/s)")
I.e., like for the simpler regression models on aggregated flanker data, we find that the flanker effect is lowest for high Swabian dominance, not for balanced bidialectalism.
# Fit model in GAMM
simon.gam <- gam(Speed ~ n.Status +
s(DominanceSwabian, by = factor(n.Status)) +
s(Subject, bs = "re") + # random intercept for subject
s(n.Status, Subject, bs = "re"), # random slope of n.Congruent by subject.
# This is equivalent to saying that the Simon effect
# varies between participants
data = datSimon)
plot(simon.gam)
The DominanceSwabian
effects are linear, so we’ll model them linearly using lmer()
.
simon.mer <- lmer(Speed ~ n.Status + c.DominanceSwabian +
c.DominanceSwabian:n.Status +
(1 + n.Status | Subject),
data = datSimon)
AIC(simon.mer)
[1] 10137
summary(simon.mer)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + c.DominanceSwabian + c.DominanceSwabian:n.Status + (1 + n.Status | Subject)
Data: datSimon
REML criterion at convergence: 10122
Scaled residuals:
Min 1Q Median 3Q Max
-4.497 -0.597 0.013 0.621 4.021
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 0.09350 0.3058
n.Status 0.00707 0.0841 0.06
Residual 0.26753 0.5172
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.589867 0.052831 49.0
n.Status 0.133422 0.019282 6.9
c.DominanceSwabian -0.001638 0.001061 -1.5
n.Status:c.DominanceSwabian -0.000928 0.000387 -2.4
Correlation of Fixed Effects:
(Intr) n.Stts c.DmnS
n.Status 0.044
c.DmnncSwbn 0.000 0.000
n.Stts:c.DS 0.000 0.000 0.044
# Without random correlation?
simon.mer2 <- lmer(Speed ~ n.Status + c.DominanceSwabian +
c.DominanceSwabian:n.Status +
(1 + n.Status || Subject),
data = datSimon)
AIC(simon.mer2)
[1] 10136
summary(simon.mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + c.DominanceSwabian + c.DominanceSwabian:n.Status + ((1 | Subject) + (0 + n.Status | Subject))
Data: datSimon
REML criterion at convergence: 10122
Scaled residuals:
Min 1Q Median 3Q Max
-4.499 -0.597 0.014 0.623 4.020
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.09350 0.3058
Subject.1 n.Status 0.00707 0.0841
Residual 0.26753 0.5172
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.589867 0.052831 49.0
n.Status 0.133422 0.019282 6.9
c.DominanceSwabian -0.001638 0.001061 -1.5
n.Status:c.DominanceSwabian -0.000928 0.000387 -2.4
Correlation of Fixed Effects:
(Intr) n.Stts c.DmnS
n.Status 0.000
c.DmnncSwbn 0.000 0.000
n.Stts:c.DS 0.000 0.000 0.000
simon.mer2
, which pegs the correlation between the random intercepts and slopes at 0, seems to be a slightly better model, though it doesn’t matter much.
The source code for this document explores whether this model is correctly specified (distribution of random effects and residuals). In conclusion, all seems to be in order.
Here we test the significance of the interaction between n.Status
and c.DominanceSwabian
.
# 1. Normal approximation
summary(simon.mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + c.DominanceSwabian + c.DominanceSwabian:n.Status + ((1 | Subject) + (0 + n.Status | Subject))
Data: datSimon
REML criterion at convergence: 10122
Scaled residuals:
Min 1Q Median 3Q Max
-4.499 -0.597 0.014 0.623 4.020
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.09350 0.3058
Subject.1 n.Status 0.00707 0.0841
Residual 0.26753 0.5172
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.589867 0.052831 49.0
n.Status 0.133422 0.019282 6.9
c.DominanceSwabian -0.001638 0.001061 -1.5
n.Status:c.DominanceSwabian -0.000928 0.000387 -2.4
Correlation of Fixed Effects:
(Intr) n.Stts c.DmnS
n.Status 0.000
c.DmnncSwbn 0.000 0.000
n.Stts:c.DS 0.000 0.000 0.000
# |t| = 2.4 corresponds to roughly p = 0.016, if we assume
# the t-distribution is well approximated by a normal distribution.
# 2. Likelihood ratio test
# Fit null model
simon.mer20 <- lmer(Speed ~ n.Status + c.DominanceSwabian +
(1 + n.Status || Subject),
data = datSimon)
anova(simon.mer20, simon.mer2)
refitting model(s) with ML (instead of REML)
Data: datSimon
Models:
simon.mer20: Speed ~ n.Status + c.DominanceSwabian + ((1 | Subject) + (0 +
simon.mer20: n.Status | Subject))
simon.mer2: Speed ~ n.Status + c.DominanceSwabian + c.DominanceSwabian:n.Status +
simon.mer2: ((1 | Subject) + (0 + n.Status | Subject))
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
simon.mer20 6 10103 10144 -5046 10091
simon.mer2 7 10100 10147 -5043 10086 5.61 1 0.018
# LRT: X²(1) = 5.6, p = 0.018
# 3. Last--and better--solution: Parametric bootstrapping:
#PBmodcomp(simon.mer2, simon.mer20, nsim = 1000)
# p =~ 0.020; the p-value will vary from run to run
As pointed out above, the mere presence of a significant interaction doesn’t mean the data are consistent with our hypothesis. For this reason, we’ll visualise the modelled Simon effect (speed for congruent trials - speed for incongruent trials) according to the participants’ Swabian dominance values.
(Since the GAMM above suggested the interaction to be linear, it is impossible that the significant interaction above is indeed consistent with our prediction, but this will be more obvious once plotted.)
# Data frame to fill with predictions
newdat <- expand.grid(n.Status = c(-0.5, 0.5),
c.DominanceSwabian = seq(min(datSimon$c.DominanceSwabian),
max(datSimon$c.DominanceSwabian)))
# Fill with predictions, ignore random effects
newdat$Prediction <- predict(simon.mer2, newdat, re.form = NA)
# Add original variables
newdat$Status <- ifelse(newdat$n.Status == -0.5, "incongruent", "congruent")
newdat$DominanceSwabian <- newdat$c.DominanceSwabian + mean(datSimon$DominanceSwabian)
# Plot effect of DominanceSwabian
ggplot(newdat,
aes(x = DominanceSwabian,
y = Prediction,
linetype = Status)) +
geom_line() +
xlab("Swabian dominance") +
ylab("Predicted response speed\n(stimuli/s)")
# Compute difference between congruent and incongruent for each Dominance value
newdatDiffs <- newdat %>%
select(Status, Prediction, DominanceSwabian) %>%
spread(., Status, Prediction) %>%
mutate(Difference = congruent - incongruent)
# Difference in response speeds:
ggplot(newdatDiffs,
aes(x = DominanceSwabian,
y = Difference)) +
geom_line() +
xlab("Swabian dominance") +
ylab("Simon effect\n(response speed difference in stimuli/s)")
I.e., like for the simpler regression models on aggregated flanker data, we find that the Simon effect is lowest for high Swabian dominance, not for balanced bidialectalism.
The minimum requirement is that a.DominanceSwabian
interacts with n.Status
, that is to say, that the effect of congruent vs. incongruent trials varies depending on the participants’ bidialectal dominance. A significant interaction in itself would not be in line with our predictions, since these specify that the flanker/Simon effects should be lowest around a.DominanceSwabian = 0
.
# Fit model in GAM
flanker.gam <- gam(Speed ~ n.Status +
s(a.DominanceSwabian, by = factor(n.Status)) +
s(Subject, bs = "re") + # random intercept for subject
s(n.Status, Subject, bs = "re"), # random slope of n.Congruent by subject.
# This is equivalent to saying that the Flanker effect
# varies between participants
data = datFlanker)
plot(flanker.gam)
The a.DominanceSwabian
effects are linear, so we’ll model them linearly using lmer()
.
flanker.mer <- lmer(Speed ~ n.Status + ca.DominanceSwabian +
ca.DominanceSwabian:n.Status +
(1 + n.Status | Subject),
data = datFlanker)
AIC(flanker.mer)
[1] 5989
summary(flanker.mer)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + ca.DominanceSwabian + ca.DominanceSwabian:n.Status + (1 + n.Status | Subject)
Data: datFlanker
REML criterion at convergence: 5973
Scaled residuals:
Min 1Q Median 3Q Max
-4.527 -0.572 0.035 0.625 4.978
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 0.0729 0.270
n.Status 0.0100 0.100 0.01
Residual 0.1410 0.375
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.379575 0.046550 51.1
n.Status 0.229235 0.019528 11.7
ca.DominanceSwabian -0.002908 0.001667 -1.7
n.Status:ca.DominanceSwabian -0.000129 0.000699 -0.2
Correlation of Fixed Effects:
(Intr) n.Stts c.DmnS
n.Status 0.006
c.DmnncSwbn 0.000 0.000
n.Stts:c.DS 0.000 0.000 0.006
# Without random correlation?
flanker.mer2 <- lmer(Speed ~ n.Status + ca.DominanceSwabian +
ca.DominanceSwabian:n.Status +
(1 + n.Status || Subject),
data = datFlanker)
AIC(flanker.mer2)
[1] 5987
summary(flanker.mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + ca.DominanceSwabian + ca.DominanceSwabian:n.Status + ((1 | Subject) + (0 + n.Status | Subject))
Data: datFlanker
REML criterion at convergence: 5973
Scaled residuals:
Min 1Q Median 3Q Max
-4.527 -0.572 0.035 0.625 4.977
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.0729 0.270
Subject.1 n.Status 0.0100 0.100
Residual 0.1410 0.375
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.379575 0.046550 51.1
n.Status 0.229235 0.019528 11.7
ca.DominanceSwabian -0.002908 0.001667 -1.7
n.Status:ca.DominanceSwabian -0.000129 0.000699 -0.2
Correlation of Fixed Effects:
(Intr) n.Stts c.DmnS
n.Status 0.000
c.DmnncSwbn 0.000 0.000
n.Stts:c.DS 0.000 0.000 0.000
flanker.mer2
, which pegs the correlation between the random intercepts and slopes at 0, seems to be a slightly better model, though it doesn’t matter much.
The source code for this document explores whether this model is correctly specified (distribution of random effects and residuals). In conclusion, all seems to be in order.
Here we test the significance of the interaction between n.Status
and ca.DominanceSwabian
.
# 1. Normal approximation
summary(flanker.mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + ca.DominanceSwabian + ca.DominanceSwabian:n.Status + ((1 | Subject) + (0 + n.Status | Subject))
Data: datFlanker
REML criterion at convergence: 5973
Scaled residuals:
Min 1Q Median 3Q Max
-4.527 -0.572 0.035 0.625 4.977
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.0729 0.270
Subject.1 n.Status 0.0100 0.100
Residual 0.1410 0.375
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.379575 0.046550 51.1
n.Status 0.229235 0.019528 11.7
ca.DominanceSwabian -0.002908 0.001667 -1.7
n.Status:ca.DominanceSwabian -0.000129 0.000699 -0.2
Correlation of Fixed Effects:
(Intr) n.Stts c.DmnS
n.Status 0.000
c.DmnncSwbn 0.000 0.000
n.Stts:c.DS 0.000 0.000 0.000
# |t| = 0.2 corresponds to roughly p = 0.84, if we assume
# the t-distribution is well approximated by a normal distribution.
# 2. Likelihood ratio test
# Fit null model
flanker.mer20 <- lmer(Speed ~ n.Status + ca.DominanceSwabian +
(1 + n.Status || Subject),
data = datFlanker)
anova(flanker.mer20, flanker.mer2)
refitting model(s) with ML (instead of REML)
Data: datFlanker
Models:
flanker.mer20: Speed ~ n.Status + ca.DominanceSwabian + ((1 | Subject) + (0 +
flanker.mer20: n.Status | Subject))
flanker.mer2: Speed ~ n.Status + ca.DominanceSwabian + ca.DominanceSwabian:n.Status +
flanker.mer2: ((1 | Subject) + (0 + n.Status | Subject))
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
flanker.mer20 6 5951 5992 -2969 5939
flanker.mer2 7 5953 6000 -2969 5939 0.04 1 0.85
# LRT: X²(1) = 0.04, p = 0.85
# 3. With these p-values, parametric bootstrapping is superfluous.
We find no interaction between n.Status
and ca.DominanceSwabian
and hence no evidence for our prediction.
# Fit model in GAM
simon.gam <- gam(Speed ~ n.Status +
s(a.DominanceSwabian, by = factor(n.Status)) +
s(Subject, bs = "re") + # random intercept for subject
s(n.Status, Subject, bs = "re"), # random slope of n.Congruent by subject.
# This is equivalent to saying that the Simon effect
# varies between participants
data = datSimon)
plot(simon.gam)
The a.DominanceSwabian
effects are linear, so we’ll model them linearly using lmer()
.
simon.mer <- lmer(Speed ~ n.Status + ca.DominanceSwabian +
ca.DominanceSwabian:n.Status +
(1 + n.Status | Subject),
data = datSimon)
AIC(simon.mer)
[1] 10138
summary(simon.mer)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + ca.DominanceSwabian + ca.DominanceSwabian:n.Status + (1 + n.Status | Subject)
Data: datSimon
REML criterion at convergence: 10122
Scaled residuals:
Min 1Q Median 3Q Max
-4.505 -0.595 0.014 0.621 4.037
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 0.09198 0.3033
n.Status 0.00905 0.0951 0.25
Residual 0.26753 0.5172
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.589867 0.052404 49.4
n.Status 0.133422 0.020738 6.4
ca.DominanceSwabian -0.003221 0.001876 -1.7
n.Status:ca.DominanceSwabian 0.000588 0.000743 0.8
Correlation of Fixed Effects:
(Intr) n.Stts c.DmnS
n.Status 0.192
c.DmnncSwbn 0.000 0.000
n.Stts:c.DS 0.000 0.000 0.192
# Without random correlation?
simon.mer2 <- lmer(Speed ~ n.Status + ca.DominanceSwabian +
ca.DominanceSwabian:n.Status +
(1 + n.Status || Subject),
data = datSimon)
AIC(simon.mer2)
[1] 10137
summary(simon.mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + ca.DominanceSwabian + ca.DominanceSwabian:n.Status + ((1 | Subject) + (0 + n.Status | Subject))
Data: datSimon
REML criterion at convergence: 10123
Scaled residuals:
Min 1Q Median 3Q Max
-4.510 -0.594 0.014 0.623 4.029
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.09198 0.3033
Subject.1 n.Status 0.00905 0.0951
Residual 0.26753 0.5172
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.589867 0.052404 49.4
n.Status 0.133422 0.020738 6.4
ca.DominanceSwabian -0.003221 0.001876 -1.7
n.Status:ca.DominanceSwabian 0.000588 0.000743 0.8
Correlation of Fixed Effects:
(Intr) n.Stts c.DmnS
n.Status 0.000
c.DmnncSwbn 0.000 0.000
n.Stts:c.DS 0.000 0.000 0.000
simon.mer2
, which pegs the correlation between the random intercepts and slopes at 0, seems to be a slightly better model, though it doesn’t matter much.
The source code for this document explores whether this model is correctly specified (distribution of random effects and residuals). In conclusion, all seems to be in order.
Here we test the significance of the interaction between n.Status
and ca.DominanceSwabian
.
# 1. Normal approximation
summary(simon.mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + ca.DominanceSwabian + ca.DominanceSwabian:n.Status + ((1 | Subject) + (0 + n.Status | Subject))
Data: datSimon
REML criterion at convergence: 10123
Scaled residuals:
Min 1Q Median 3Q Max
-4.510 -0.594 0.014 0.623 4.029
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.09198 0.3033
Subject.1 n.Status 0.00905 0.0951
Residual 0.26753 0.5172
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.589867 0.052404 49.4
n.Status 0.133422 0.020738 6.4
ca.DominanceSwabian -0.003221 0.001876 -1.7
n.Status:ca.DominanceSwabian 0.000588 0.000743 0.8
Correlation of Fixed Effects:
(Intr) n.Stts c.DmnS
n.Status 0.000
c.DmnncSwbn 0.000 0.000
n.Stts:c.DS 0.000 0.000 0.000
# |t| = 0.8 corresponds to roughly p = 0.42, if we assume
# the t-distribution is well approximated by a normal distribution.
# 2. Likelihood ratio test
# Fit null model
simon.mer20 <- lmer(Speed ~ n.Status + ca.DominanceSwabian +
(1 + n.Status || Subject),
data = datSimon)
anova(simon.mer20, simon.mer2)
refitting model(s) with ML (instead of REML)
Data: datSimon
Models:
simon.mer20: Speed ~ n.Status + ca.DominanceSwabian + ((1 | Subject) + (0 +
simon.mer20: n.Status | Subject))
simon.mer2: Speed ~ n.Status + ca.DominanceSwabian + ca.DominanceSwabian:n.Status +
simon.mer2: ((1 | Subject) + (0 + n.Status | Subject))
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
simon.mer20 6 10103 10143 -5045 10091
simon.mer2 7 10104 10151 -5045 10090 0.66 1 0.42
# LRT: X²(1) = 0.66, p = 0.42
# 3. Parametric bootstrapping: superfluous
We find no solid interaction between n.Status
and ca.DominanceSwabian
and hence no evidence for our prediction.
# Fit model in GAM
flanker.gam <- gam(Speed ~ n.Status +
s(TotalSwabian, by = factor(n.Status)) +
s(Subject, bs = "re") + # random intercept for subject
s(n.Status, Subject, bs = "re"), # random slope of n.Congruent by subject.
# This is equivalent to saying that the Flanker effect
# varies between participants
data = datFlanker)
plot(flanker.gam)
The TotalSwabian
effects are linear, so we’ll model them linearly using lmer()
.
flanker.mer <- lmer(Speed ~ n.Status + c.TotalSwabian +
c.TotalSwabian:n.Status +
(1 + n.Status | Subject),
data = datFlanker)
AIC(flanker.mer)
[1] 5990
summary(flanker.mer)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + c.TotalSwabian + c.TotalSwabian:n.Status + (1 + n.Status | Subject)
Data: datFlanker
REML criterion at convergence: 5974
Scaled residuals:
Min 1Q Median 3Q Max
-4.555 -0.574 0.036 0.625 4.986
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 0.07959 0.2821
n.Status 0.00897 0.0947 0.00
Residual 0.14097 0.3755
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.379575 0.048605 49.0
n.Status 0.229235 0.018712 12.3
c.TotalSwabian -0.000555 0.001467 -0.4
n.Status:c.TotalSwabian -0.000960 0.000565 -1.7
Correlation of Fixed Effects:
(Intr) n.Stts c.TtlS
n.Status -0.004
c.TotalSwbn 0.000 0.000
n.Stts:c.TS 0.000 0.000 -0.004
# Without random correlation?
flanker.mer2 <- lmer(Speed ~ n.Status + c.TotalSwabian +
c.TotalSwabian:n.Status +
(1 + n.Status || Subject),
data = datFlanker)
AIC(flanker.mer2)
[1] 5988
summary(flanker.mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + c.TotalSwabian + c.TotalSwabian:n.Status + ((1 | Subject) + (0 + n.Status | Subject))
Data: datFlanker
REML criterion at convergence: 5974
Scaled residuals:
Min 1Q Median 3Q Max
-4.555 -0.573 0.036 0.625 4.986
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.07959 0.2821
Subject.1 n.Status 0.00897 0.0947
Residual 0.14097 0.3755
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.379575 0.048605 49.0
n.Status 0.229235 0.018712 12.3
c.TotalSwabian -0.000555 0.001467 -0.4
n.Status:c.TotalSwabian -0.000960 0.000565 -1.7
Correlation of Fixed Effects:
(Intr) n.Stts c.TtlS
n.Status 0.000
c.TotalSwbn 0.000 0.000
n.Stts:c.TS 0.000 0.000 0.000
flanker.mer2
, which pegs the correlation between the random intercepts and slopes at 0, seems to be a slightly better model, though it doesn’t matter much.
The source code for this document explores whether this model is correctly specified (distribution of random effects and residuals). In conclusion, all seems to be in order.
Here we test the significance of the interaction between n.Status
and c.TotalSwabian
.
# 1. Normal approximation
summary(flanker.mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + c.TotalSwabian + c.TotalSwabian:n.Status + ((1 | Subject) + (0 + n.Status | Subject))
Data: datFlanker
REML criterion at convergence: 5974
Scaled residuals:
Min 1Q Median 3Q Max
-4.555 -0.573 0.036 0.625 4.986
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.07959 0.2821
Subject.1 n.Status 0.00897 0.0947
Residual 0.14097 0.3755
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.379575 0.048605 49.0
n.Status 0.229235 0.018712 12.3
c.TotalSwabian -0.000555 0.001467 -0.4
n.Status:c.TotalSwabian -0.000960 0.000565 -1.7
Correlation of Fixed Effects:
(Intr) n.Stts c.TtlS
n.Status 0.000
c.TotalSwbn 0.000 0.000
n.Stts:c.TS 0.000 0.000 0.000
# |t| = 1.7 corresponds to roughly p = 0.09, if we assume
# the t-distribution is well approximated by a normal distribution.
# 2. Likelihood ratio test
# Fit null model
flanker.mer20 <- lmer(Speed ~ n.Status + c.TotalSwabian +
(1 + n.Status || Subject),
data = datFlanker)
anova(flanker.mer20, flanker.mer2)
refitting model(s) with ML (instead of REML)
Data: datFlanker
Models:
flanker.mer20: Speed ~ n.Status + c.TotalSwabian + ((1 | Subject) + (0 + n.Status |
flanker.mer20: Subject))
flanker.mer2: Speed ~ n.Status + c.TotalSwabian + c.TotalSwabian:n.Status +
flanker.mer2: ((1 | Subject) + (0 + n.Status | Subject))
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
flanker.mer20 6 5954 5995 -2971 5942
flanker.mer2 7 5953 6000 -2969 5939 2.94 1 0.086
# LRT: X²(1) = 2.9, p = 0.086
# 3. Last--and better--solution: Parametric bootstrapping:
#PBmodcomp(flanker.mer2, flanker.mer20, nsim = 1000)
# p =~ 0.099; the p-value will vary from run to run
We find no interaction between n.Status
and c.TotalSwabian
and hence no evidence for our prediction.
# Fit model in GAM
simon.gam <- gam(Speed ~ n.Status +
s(TotalSwabian, by = factor(n.Status)) +
s(Subject, bs = "re") + # random intercept for subject
s(n.Status, Subject, bs = "re"), # random slope of n.Congruent by subject.
# This is equivalent to saying that the Simon effect
# varies between participants
data = datSimon)
plot(simon.gam)
The TotalSwabian
effects are linear, so we’ll model them linearly using lmer()
.
simon.mer <- lmer(Speed ~ n.Status + c.TotalSwabian +
c.TotalSwabian:n.Status +
(1 + n.Status | Subject),
data = datSimon)
AIC(simon.mer)
[1] 10138
summary(simon.mer)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + c.TotalSwabian + c.TotalSwabian:n.Status + (1 + n.Status | Subject)
Data: datSimon
REML criterion at convergence: 10122
Scaled residuals:
Min 1Q Median 3Q Max
-4.497 -0.597 0.014 0.620 4.033
Random effects:
Groups Name Variance Std.Dev. Corr
Subject (Intercept) 0.09758 0.3124
n.Status 0.00733 0.0856 0.11
Residual 0.26753 0.5172
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.589867 0.053953 48.0
n.Status 0.133422 0.019480 6.8
c.TotalSwabian -0.001602 0.001629 -1.0
n.Status:c.TotalSwabian -0.001312 0.000588 -2.2
Correlation of Fixed Effects:
(Intr) n.Stts c.TtlS
n.Status 0.086
c.TotalSwbn 0.000 0.000
n.Stts:c.TS 0.000 0.000 0.086
# Without random correlation?
simon.mer2 <- lmer(Speed ~ n.Status + c.TotalSwabian +
c.TotalSwabian:n.Status +
(1 + n.Status || Subject),
data = datSimon)
AIC(simon.mer2)
[1] 10136
summary(simon.mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + c.TotalSwabian + c.TotalSwabian:n.Status + ((1 | Subject) + (0 + n.Status | Subject))
Data: datSimon
REML criterion at convergence: 10122
Scaled residuals:
Min 1Q Median 3Q Max
-4.501 -0.597 0.013 0.625 4.031
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.09758 0.3124
Subject.1 n.Status 0.00733 0.0856
Residual 0.26753 0.5172
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.589867 0.053953 48.0
n.Status 0.133422 0.019480 6.8
c.TotalSwabian -0.001602 0.001629 -1.0
n.Status:c.TotalSwabian -0.001312 0.000588 -2.2
Correlation of Fixed Effects:
(Intr) n.Stts c.TtlS
n.Status 0.000
c.TotalSwbn 0.000 0.000
n.Stts:c.TS 0.000 0.000 0.000
simon.mer2
, which pegs the correlation between the random intercepts and slopes at 0, seems to be a slightly better model, though it doesn’t matter much.
The source code for this document explores whether this model is correctly specified (distribution of random effects and residuals). In conclusion, all seems to be in order.
Here we test the significance of the interaction between n.Status
and c.TotalSwabian
.
# 1. Normal approximation
summary(simon.mer2)
Linear mixed model fit by REML ['lmerMod']
Formula: Speed ~ n.Status + c.TotalSwabian + c.TotalSwabian:n.Status + ((1 | Subject) + (0 + n.Status | Subject))
Data: datSimon
REML criterion at convergence: 10122
Scaled residuals:
Min 1Q Median 3Q Max
-4.501 -0.597 0.013 0.625 4.031
Random effects:
Groups Name Variance Std.Dev.
Subject (Intercept) 0.09758 0.3124
Subject.1 n.Status 0.00733 0.0856
Residual 0.26753 0.5172
Number of obs: 6528, groups: Subject, 34
Fixed effects:
Estimate Std. Error t value
(Intercept) 2.589867 0.053953 48.0
n.Status 0.133422 0.019480 6.8
c.TotalSwabian -0.001602 0.001629 -1.0
n.Status:c.TotalSwabian -0.001312 0.000588 -2.2
Correlation of Fixed Effects:
(Intr) n.Stts c.TtlS
n.Status 0.000
c.TotalSwbn 0.000 0.000
n.Stts:c.TS 0.000 0.000 0.000
# |t| = 2.2 corresponds to roughly p = 0.028, if we assume
# the t-distribution is well approximated by a normal distribution.
# 2. Likelihood ratio test
# Fit null model
simon.mer20 <- lmer(Speed ~ n.Status + c.TotalSwabian +
(1 + n.Status || Subject),
data = datSimon)
anova(simon.mer20, simon.mer2)
refitting model(s) with ML (instead of REML)
Data: datSimon
Models:
simon.mer20: Speed ~ n.Status + c.TotalSwabian + ((1 | Subject) + (0 + n.Status |
simon.mer20: Subject))
simon.mer2: Speed ~ n.Status + c.TotalSwabian + c.TotalSwabian:n.Status +
simon.mer2: ((1 | Subject) + (0 + n.Status | Subject))
Df AIC BIC logLik deviance Chisq Chi Df Pr(>Chisq)
simon.mer20 6 10105 10145 -5046 10093
simon.mer2 7 10102 10149 -5044 10088 4.92 1 0.027
# LRT: X²(1) = 4.9, p = 0.027
# 3. Last--and better--solution: Parametric bootstrapping:
#PBmodcomp(simon.mer2, simon.mer20, nsim = 1000)
# p =~ 0.030; the p-value will vary from run to run
For the analyses reported above, we took a number of more or less arbitrary decisions (our decisions in bold):
Transformation
: Should the analysis be based on the untransformed (raw) latencies, on inverse-transformed latencies (speed) or log-transformed latencies (log)? (3 possibilities)ExclusionAccuracy
: Should latencies for incorrect responses be excluded? (no vs. yes; 2)ExclusionLatency
: Should there be a lower threshold for response latencies, and if so, which should the threshold be? (none, 200 ms, 250 ms, 300 ms, to name but 4 possibilities)ExclusionSDs
: Should responses be excluded because they are more than a set number of standard deviations removed from the participants’ mean, and if so, what should this number of standard deviations be? The standard deviations and means are computed for each participant on the transformed latency data (i.e., raw, speed or log). Any responses more than ExclusionSDs
standard deviations away from the participant’s will be dropped. (none, 2.5 SDs, to name but 2 possibilities)ExclusionSDsOverall
: Should responses be excluded because they are more than a set number of standard deviations removed from the overall mean, and if so, what should this number of standard deviations be? The standard deviations and means are computed on the transformed latency data (i.e., raw, speed or log). Any responses more than ExclusionSDsOverall
standard deviations away from the overall mean will be dropped. (none, 2.5 SDs, to name but 2 possibilities)DV
: Which variable should be used as the dependent variable: the difference between the condition means (mean difference) or the ratio between the condition means (mean ratio) (to name but 2 possibilities).IV
: Which variable should be used as the independent variable: raw dominance, absolute dominance, raw dominance but only considering oral skills, absolute dominance but only considering oral skills, raw dominance of use only, absolute dominance of use only, total Swabian score, or total Swabian score but considering oral skills only (8 possibilities that we considered defensible).Task
: Should the Simon or flanker data be analysed? (2)By combining all of these possibilities, there are (easily) 3,072 alternative ways of analysing these data.
# Matrix that lists all analysis parameters
alternatives <- expand.grid(data = "data",
Task = c("Simon", "Flanker"),
Transformation = c("raw", "speed", "log"),
ExclusionAccuracy = c("yes", "no"),
ExclusionLatency = c(0, 200, 250, 300),
ExclusionSDs = c(NA, 2.5),
ExclusionSDsOverall = c(NA, 2.5), # not yet implemented
DV = c("mean difference", "mean ratio"),
IV = c("raw dominance", "absolute dominance",
"raw dominance, oral", "absolute dominance, oral",
"raw use", "absolute use",
"total Swabian", "total Swabian, oral"))
nrow(alternatives)
[1] 3072
allAnalyses <-
function(data,
Task,
Transformation,
ExclusionAccuracy = "no",
ExclusionLatency = 0,
ExclusionSDs = NULL,
ExclusionSDsOverall = NULL,
DV,
IV) {
require("dplyr")
require("tidyr")
require("magrittr")
# Select task
if (Task %in% c("Simon", "Flanker")) {
if (Task == "Simon") {
dat <- subset(data, Task == "Simon")
} else if (Task == "Flanker") {
dat <- subset(data, Task == "Flanker")
}
} else {
stop("Task not 'Simon' or 'Flanker'.")
}
# Transform data
if (!(Transformation %in% c("raw", "speed", "log"))) {
stop("Transformation not 'raw', 'speed' or 'log'.")
} else if (Transformation == "raw") {
dat$Response <- dat[, "RT"]
} else if (Transformation == "speed") {
dat$Response <- 1000 / dat[, "RT"]
} else if (Transformation == "log") {
dat$Response <- log10(dat[, "RT"])
# Note that the base of the logarithm is immaterial.
}
# Exclude incorrect responses
# Track how many were excluded
excludedAccuracy <- 0
if (!(ExclusionAccuracy %in% c("yes", "no"))) {
stop("ExclusionAccuracy needs to be either yes or no.")
} else if (ExclusionAccuracy == "yes") {
excludedAccuracy <- sum(dat$Accuracy == "0")
dat <- dat[dat$Accuracy == "1",]
}
# Exclude overly fast responses
# Track how many were excluded
excludedLatency <- 0
if (!missing(ExclusionLatency)) {
if (ExclusionLatency < 0) {
stop("Set ExclusionLatency to a positive (or zero) numeric variable.")
} else {
excludedLatency <- sum(dat$RT < ExclusionLatency)
dat <- dat[dat$RT > ExclusionLatency, ]
}
}
# Exclude responses (not reaction times) more than x SDs away from the participants' mean
# Track how many were excluded
excludedOutliers <- 0
if (!is.na(ExclusionSDs)) {
if (ExclusionSDs < 0) {
stop("Set ExclusionSDs to a positive (or zero) numeric variable.")
} else {
# Compute mean and sd per participant
perParticipant <- dat %>%
group_by_(., "Subject") %>%
summarise_(meanResponse = "mean(Response)",
sdResponse = "sd(Response)")
# Add perParticipant to dat
dat <- left_join(dat, perParticipant, by = "Subject")
# Check how many responses are more than x sds away from participant mean
# dat$Low <- dat$meanResponse - ExclusionSDs * dat$sdResponse
dat$Outlier <-
(dat$Response < dat$meanResponse - ExclusionSDs * dat$sdResponse) |
(dat$Response > dat$meanResponse + ExclusionSDs * dat$sdResponse)
excludedOutliers <- sum(dat$Outlier)
# Remove outliers
dat <- dat %>% filter(Outlier == FALSE)
}
} else if (is.na(ExclusionSDs)) {
dat <- dat
}
# Exclude responses (not reaction times) more than x SDs away from the overall mean
# Track how many were excluded
excludedOutliersOverall <- 0
if (!is.na(ExclusionSDsOverall)) {
if (ExclusionSDsOverall < 0) {
stop("Set ExclusionSDsOverall to a positive (or zero) numeric variable.")
} else {
# Check how many responses are more than x sds away from participant mean
OutlierOverall <-
(dat$Response < mean(dat$Response) - ExclusionSDsOverall * sd(dat$Response)) |
(dat$Response > mean(dat$Response) + ExclusionSDsOverall * sd(dat$Response))
excludedOutliersOverall <- sum(OutlierOverall)
# Remove outliers
dat <- dat[OutlierOverall == FALSE, ]
}
} else if (is.na(ExclusionSDsOverall)) {
dat <- dat
}
# Select dependent variable
if (!(DV %in% c(
"mean difference",
"mean ratio",
"median difference",
"median ratio"
))) {
stop(
"The dependent variable (DV) should be 'mean difference', 'mean ratio', 'median difference' or 'mean ratio'.
Please note that 'no aggregation' is not currently supported."
)
} else if (DV == "mean difference") {
dat_DV <- dat %>%
group_by_("Subject", "Status") %>%
summarise_(average = "mean(Response)") %>% # mean response
spread_("Status", "average") %>%
mutate_(DV = quote(congruent - incongruent)) # difference score
# For raw latencies: How much _SLOWER_ are participants in the congruent condition?
# We'd expect this to be a negative number for most participants
# (i.e. they're actually faster in the congruent condition).
# Lower numbers (= more negative numbers) indicate larger Simon effects.
# Prediction: This number should be LARGER for more balanced bilinguals. (sign: 1)
# For speeds: How much _FASTER_ are participants in the congruent condition?
# We'd expect this to be positive number for most participants.
# Larger numbers indicate larger Simon effects.
# Prediction: This number should be SMALLER for more balanced bilinguals. (sign: -1)
# For logs: Same as for raw latencies. (sign: 1)
} else if (DV == "mean ratio") {
dat_DV <- dat %>%
group_by_("Subject", "Status") %>%
summarise_(average = "mean(Response)") %>% # mean response
spread_("Status", "average") %>%
mutate_(DV = quote(congruent / incongruent)) # ratio score
# For raw latencies: Relative to the incongruent condition, how slow are participants in the congruent condition?
# This number will be less than 1 for most participants.
# Lower numbers (more towards 0) indicate larger Simon effects.
# Prediction: This number should be LARGER for more balanced bilinguals. (sign: 1)
# For speeds: Relative to the incongruent condition, how fast are participants in the congruent condition?
# This number will be more than 1 for most participants.
# Larger numbers indicate larger Simon effects.
# Prediction: This number should be SMALLER for more balanced bilinguals. (sign: -1)
# For logs: Same as for raw latencies. (sign: 1)
# We don't consider analyses on medians in this multiverse analysis:
} else if (DV == "median difference") {
dat_DV <- dat %>%
group_by_("Subject", "Status") %>%
summarise_(average = ~ median(Response)) %>% # median response
spread_("Status", "average") %>%
mutate_(DV = quote(congruent - incongruent)) # difference score
# Predictions: see mean difference.
} else if (DV == "median ratio") {
dat_DV <- dat %>%
group_by_("Subject", "Status") %>%
summarise_(average = ~ median(Response)) %>% # median response
spread_("Status", "average") %>%
mutate_(DV = quote(congruent / incongruent)) # ratio score
# Predictions: see mean ratio.
}
# Select predictor
if (!(
IV %in% c(
"raw dominance",
"absolute dominance",
"raw dominance, oral",
"absolute dominance, oral",
"raw use",
"absolute use",
"total Swabian",
"total Swabian, oral"
)
)) {
stop(
"The independent variable (IV) should be 'raw dominance', 'absolute dominance',
'raw dominance, oral', 'absolute dominance, oral',
'raw use', or 'absolute use'."
)
} else if (IV == "raw dominance") {
dat_IV <- dat %>%
select_("Subject", "DominanceSwabian") %>%
distinct_() %>%
rename_(IV = "DominanceSwabian")
} else if (IV == "absolute dominance") {
dat_IV <- dat %>%
select_("Subject", "DominanceSwabian") %>%
distinct_() %>%
mutate_(IV = ~ abs(DominanceSwabian))
} else if (IV == "raw dominance, oral") {
dat_IV <- dat %>%
select_("Subject", "DominanceSwabianOral") %>%
distinct_() %>%
mutate_(IV = "DominanceSwabianOral")
} else if (IV == "absolute dominance, oral") {
dat_IV <- dat %>%
select_("Subject", "DominanceSwabianOral") %>%
distinct_() %>%
mutate_(IV = ~ abs(DominanceSwabianOral))
} else if (IV == "raw use") {
dat_IV <- dat %>%
select_("Subject", "DominanceSwabianUse") %>%
distinct_() %>%
rename_(IV = "DominanceSwabianUse")
} else if (IV == "absolute use") {
dat_IV <- dat %>%
select_("Subject", "DominanceSwabianUse") %>%
distinct_() %>%
mutate_(IV = ~ abs(DominanceSwabianUse))
} else if (IV == "total Swabian") {
dat_IV <- dat %>%
select_("Subject", "TotalSwabian") %>%
distinct_() %>%
rename_(IV = "TotalSwabian")
} else if (IV == "total Swabian, oral") {
dat_IV <- dat %>%
select_("Subject", "TotalSwabianOral") %>%
distinct_() %>%
rename_(IV = "TotalSwabianOral")
}
# Combine datasets
dat_combined <- left_join(dat_DV, as.tbl(dat_IV), by = "Subject")
# Run analysis; simple regression model.
mod.lm <- lm(DV ~ IV, dat_combined)
return(
list(
excludedAccuracy = excludedAccuracy,
excludedLatency = excludedLatency,
excludedOutliers = excludedOutliers,
excludedOutliersOverall = excludedOutliersOverall,
pValue = summary(mod.lm)$coef[2, 4],
direction = sign(coef(mod.lm)[2])
)
)
}
# See if this works:
allAnalyses(data = dat,
Task = "Flanker",
ExclusionAccuracy = "yes",
Transformation = "log",
ExclusionLatency = 250,
ExclusionSDs = 2.5,
ExclusionSDsOverall = 2.5,
DV = "mean difference",
IV = "raw dominance")
$excludedAccuracy
[1] 78
$excludedLatency
[1] 6
$excludedOutliers
[1] 156
$excludedOutliersOverall
[1] 121
$pValue
[1] 0.05935
$direction
IV
1
# Some parameter settings need to be converted to character strings.
Transformations <- as.character(alternatives$Transformation)
Tasks <- as.character(alternatives$Task)
Accuracies <- as.character(alternatives$ExclusionAccuracy)
DVs <- as.character(alternatives$DV)
IVs <- as.character(alternatives$IV)
# Prepare dataframe for the multiverse analysis
multiverse <- data.frame(Task = alternatives$Task,
Transformation = alternatives$Transformation,
ExclusionAccuracy = alternatives$ExclusionAccuracy,
ExclusionLatency = alternatives$ExclusionLatency,
ExclusionSDs = alternatives$ExclusionSDs,
ExclusionSDsOverall = alternatives$ExclusionSDsOverall, # not yet implemented
DV = alternatives$DV,
IV = alternatives$IV,
excludedAccuracy = NA,
excludedLatency = NA,
excludedOutliers = NA,
excludedOutliersOverall = NA,
pValue = NA,
direction = NA)
multiverse
# Loop through entire grid.
# It must be possible to do this more elegantly, but mapply() causes difficulties.
for (i in 1:nrow(alternatives)) {
results <- allAnalyses(data = dat,
Task = Tasks[i],
Transformation = Transformations[i],
ExclusionAccuracy = Accuracies[i],
ExclusionLatency = alternatives$ExclusionLatency[i],
ExclusionSDs = alternatives$ExclusionSDs[i],
ExclusionSDsOverall = alternatives$ExclusionSDsOverall[i], # not yet implemented
DV = DVs[i],
IV = IVs[i])
multiverse$excludedAccuracy[i] <- results$excludedAccuracy
multiverse$excludedLatency[i] <- results$excludedLatency
multiverse$excludedOutliers[i] <- results$excludedOutliers
multiverse$excludedOutliersOverall[i] <- results$excludedOutliersOverall # not yet implemented
multiverse$pValue[i] <- results$pValue
multiverse$direction[i] <- results$direction
}
# summary(multiverse)
We’re going to add a couple of variable to the multiverse
dataframe:
# Was the sign of the regression coefficient,
# regardless of its significance,
# in the direction expected under the overall "bidialectalism > better executive function" hypothesis?
multiverse$DirectionExpected <- "no"
multiverse[multiverse$Transformation %in% c("raw", "log") & multiverse$direction == 1, ]$DirectionExpected <- "yes"
multiverse[multiverse$Transformation == "speed" & multiverse$direction == -1, ]$DirectionExpected <- "yes"
multiverse$DirectionExpected <- factor(multiverse$DirectionExpected)
# Was the regression coefficient significant?
multiverse$Significant <- multiverse$pValue < 0.05
We deem a handful of combinations of analytical choices to be internally inconsistent. Specifically, we don’t see how it would make sense to take the ratio of two means of logarithmically transformed values. We’ll remove these from consideration:
multiverse_sensible <- multiverse %>%
filter(!(Transformation == "log" & DV == "mean ratio"))
This leaves 2560 from the original 3072 analyses.
theme_set(theme_cowplot(12))
p_multi <- ggplot(multiverse_sensible,
aes(x = pValue,
fill = Significant)) +
geom_histogram(colour = "black", binwidth = 0.10) +
geom_vline(xintercept = 0.05, lty = 2, colour = "black") +
facet_grid(Task + Transformation ~ IV) +
scale_x_log10(breaks = c(0.01, 0.05, 0.2, 1)) +
scale_fill_manual(values = c("white", "red3")) +
xlab("p-value") +
ylab("no. observations") +
theme(legend.position = "none")
print(p_multi)
save_plot("MultiverseAnalysis.svg", p_multi, base_width = 14, base_height = 8)
Breakdown of significant results per panel:
multiverse_sensible %>%
group_by(Task, Transformation, IV) %>%
summarise(`Proportion significant` = mean(Significant),
`Proportion expected` = mean(DirectionExpected == "yes"),
`Median p-value` = median(pValue)) %>%
arrange(desc(`Proportion significant`))
Aggregate by Task and IV only:
multiverse_sensible %>%
group_by(Task, IV) %>%
summarise(`Percentage significant` = mean(Significant)*100,
`Percentage expected` = mean(DirectionExpected == "yes")*100,
`Median p-value` = median(pValue)) %>%
arrange(desc(`Percentage significant`),
`Median p-value`)
A clickable list of all analyses sorted from the lowest p-value to the highest:
multiverse_sensible %>%
arrange(pValue)
Follow-up question: What causes the variability within the raw use cells?
# Excluding observations based on deviation from overall mean?
multiverse_sensible %>%
filter(IV == "raw use") %>%
ggplot(.,
aes(x = pValue,
fill = Significant)) +
geom_histogram(colour = "black", binwidth = 0.10) +
geom_vline(xintercept = 0.05, lty = 2, colour = "black") +
facet_grid(Task + Transformation ~ ExclusionSDsOverall) +
scale_x_log10(breaks = c(0.01, 0.05, 0.2, 1)) +
scale_fill_manual(values = c("white", "red3")) +
xlab("p-value") +
ylab("no. observations") +
theme(legend.position = "none")
# Excluding observations based on deviation from participant mean?
multiverse_sensible %>%
filter(IV == "raw use") %>%
ggplot(.,
aes(x = pValue,
fill = Significant)) +
geom_histogram(colour = "black", binwidth = 0.10) +
geom_vline(xintercept = 0.05, lty = 2, colour = "black") +
facet_grid(Task + Transformation ~ ExclusionSDs) +
scale_x_log10(breaks = c(0.01, 0.05, 0.2, 1)) +
scale_fill_manual(values = c("white", "red3")) +
xlab("p-value") +
ylab("no. observations") +
theme(legend.position = "none")
# Expression of DV
multiverse_sensible %>%
filter(IV == "raw use") %>%
ggplot(.,
aes(x = pValue,
fill = Significant)) +
geom_histogram(colour = "black", binwidth = 0.10) +
geom_vline(xintercept = 0.05, lty = 2, colour = "black") +
facet_grid(Task + Transformation ~ DV) +
scale_x_log10(breaks = c(0.01, 0.05, 0.2, 1)) +
scale_fill_manual(values = c("white", "red3")) +
xlab("p-value") +
ylab("no. observations") +
theme(legend.position = "none")
# Exclusion of incorrect responses
multiverse_sensible %>%
filter(IV == "raw use") %>%
ggplot(.,
aes(x = pValue,
fill = Significant)) +
geom_histogram(colour = "black", binwidth = 0.10) +
geom_vline(xintercept = 0.05, lty = 2, colour = "black") +
facet_grid(Task + Transformation ~ ExclusionAccuracy) +
scale_x_log10(breaks = c(0.01, 0.05, 0.2, 1)) +
scale_fill_manual(values = c("white", "red3")) +
xlab("p-value") +
ylab("no. observations") +
theme(legend.position = "none")