Rasch gone mixed: A mixed model approach to the Implicit Association Test

,


Introduction
The idea that people's attitudes include components of which they are aware (i.e., explicit or direct) and components of which they are not completely aware and that cannot be controlled (i.e., implicit or indirect) has now been widely accepted (e.g., Meissner, Grigutsch, Koranyi, Müller, & Rothermund, 2019).Among the measures aimed at capturing the implicit components of attitudes, the Implicit Association Test (IAT; Greenwald, McGhee, & Schwartz, 1998) is one of the most studied and used in a constantly wider and more varied range of fields of application (see Epifania, Robusto, & Anselmi, 2020, for a review).By appropriately changing the labels of the attitude objects under investigation and leaving its structure unaltered, the IAT can be easily adapted for the investigation of different topics, ranging from personality and self-esteem (e.g., Van Tuijl et al., 2016;Vecchione et al., 2016) to emotions (Riediger, Wrzus, & Wagner, 2014), addiction behaviors (e.g., Tatnell, Loxton, Modecki, & Hamilton, 2019), and perception (e.g., Wu, Lu, van Dijk, Li, & Schnall, 2018).Given the IAT resistance to self-presentation strategies (Egloff & Schmukle, 2002;Greenwald, Poehlman, Uhlmann, & Banaji, 2009), it finds its main application in social cognition, where it is used for the implicit assessment of attitudes towards different social groups (e.g., Anselmi, Vianello, & Robusto, 2011;Anselmi, Voci, Vianello, & Robusto, 2015), even in sensitive social contexts like hospitals (e.g., Zeidan et al., 2019).Despite its broad use, the meaning of the effect obtained from the IAT remains unclear.For example, in an IAT for the investigation of the implicit attitudes towards Black and White people (i.e., Race IAT), the implicit preference for White people over Black people could be due to either a favoritism for White people, or a derogation for Black people, or both.The aim of this contribution is to help in shedding a light on the meaning of the IAT effect by considering the information that can be retrieved from the random variability at the levels of the stimuli and the respondents.
The IAT is based on the speed and accuracy with which prototypical exemplars of two contrasting target categories (e.g., White and Black people in a Race IAT) and exemplars of two evaluative categories (Good and Bad) are sorted in the category to which they belong by means of two response keys.The categorization task takes place in two contrasting associative conditions, under the assumption that respondents would have a better performance, in terms of faster response times and higher accuracy, when the task is compatible with their automatically activated association.In one associative condition, the labels Good and White are displayed on the same side of the screen, and exemplars belonging to these categories are sorted with the same response key.The labels Bad and Black are displayed on the opposite side of the screen, and their exemplars are mapped with the same response key.In the contrasting associative condition, the labels White and Black switch their locations on the sides of the screen.Good and Black share the same side of the screen, and are mapped with the same response key.Bad and White are displayed on the opposite side of the screen, and are mapped with the other response key.The so-called IAT effect results from the difference in respondents' performance between the two conditions.The strength and direction of the IAT effect is usually expressed by the D score (Greenwald, Nosek, & Banaji, 2003), which results from the standardization of the difference in the average response time between the two conditions.The effect size measure proposed byGreenwald et al. ( 2003) is the most commonly used.Other authors have introduced modifications to the D score algorithm to either obtain more robust scores (Richetin, Costantini, Perugini, & Schönbrodt, 2015) or to fairly compare the IAT with other implicit measures (Epifania, Anselmi, & Robusto, 2020b).The D score provides general information on the implicit constructs under investigation, but it cannot inform about the automatic associa-tions that mostly contribute to the IAT effect.Sticking with the Race IAT example, it would not be possible to discern whether the result is mostly due to an in-group favoritism, an out-group derogation, or even both.Moreover, since the D score is obtained by averaging across all trials in each associative condition, it cannot account for the dependency between the single IAT observations due to the random variability at the levels of both the stimuli and the respondents.
As such, it might result in inflated scores (Brauer & Curtin, 2017;Judd, Westfall, & Kenny, 2012;Wolsiefer, Westfall, & Judd, 2017), leading to inaccurate inferences on the implicit attitudes under investigation.Additionally, by overlooking the variability related to the stimuli, the information that can be gathered from each singular stimulus and their categories is completely neglected (Wolsiefer et al., 2017).
Different models have been proposed for getting a better understanding of the IAT effect.Some of these models, like the Quad Model (Conrey, Gawronski, Sherman, Hugenberg, & Groom, 2005) or the ReAL Model (Meissner & Rothermund, 2013), consider only the accuracy responses, while other models, like the Diffusion Model (DM; Klauer, Voss, Schmitz, & Teige-Mocigemba, 2007) or the Discrimination-Association Model (DAM; Stefanutti, Robusto, Vianello, & Anselmi, 2013), simultaneously account for both accuracy and time responses.These models provide useful information at either the sample level (Quad model and ReAL model) or the respondent level (DAM and DM).DM and DAM also inform about the stimuli, but the information is provided at the level of stimuli categories and not at that of individual stimuli.The functioning of the stimuli is indeed a vital component for the correct functioning of the IAT itself (e.e., Bluemke & Friese, 2006).As such, fine-grained information at the stimuli level would allow for testing the functioning of the individual stimuli, resulting in better functioning IATs.Furthermore, the investigation on the contribution of each stimulus to the IAT effect would help in shedding a light on the meaning of the implicit measure itself.
The Rasch modeling (Rasch, 1960) of the IAT data provides such a fine-grained analysis at the level of each stimulus.It allows for disentangling the automatic associations that mostly contribute to the IAT effect and for providing a better understanding of the measure itself.
For instance, by applying the Rasch model to the IAT discretized response times, Anselmi et al. (2011) found that positive words were the stimuli that mostly contributed to the IAT effect.By drawing on this result, the authors suggested that the implicit preference for White people over Black people that is often observed in White respondents could be expression of in-group favoritism rather than out-group derogation.Despite the interesting insights provided by the Rasch modeling of the IAT data, its application comes with some limitations.Firstly, the discretization of the response times may result in a large loss of information that can be avoided by considering the response times in their continuous nature.Additionally, the Rasch model in its typical form cannot account for the non-independence of the IAT observations.This potentially results in biased parameters estimates and might lead to an incorrect estimation of the importance of the effect of the IAT associative conditions (Judd et al., 2012;Judd, Westfall, & Kenny, 2017;McCullagh & Nelder, 1989).Finally, for the application of the Rasch model to the IAT, it was assumed that the difficulty of the two associative conditions did not differ across respondents, hence the respondents' individual differences were neglected.
Linear Mixed Effects Models (LMMs) can easily handle all the above-mentioned issues, while providing a Rasch parametrization of the data.LMMs also allows for treating the response times in their continuous nature, potentially avoiding the loss of information related to their discretization.To better understand the IAT effect and the meaning of the IAT measure, while addressing the issues related to its sources of random variations, in the present work: (i) Generalized LMMs (GLMMs) have been applied to IAT accuracy responses to obtain Rasch model parameters estimates; (ii) LMMs have been applied to IAT log-time responses to obtain log-normal model parameters estimates; and (iii) The relationship between the classic measure of the IAT effect (i.e., the D score) and the estimates of the models parameters obtained via the GLMM and the LMM has been investigated.
In the following section, the use of Rasch model and log-normal model for the analysis of IAT data is described, as well as the meaning of the resulting parameters.The application of these models to a Race IAT is presented.Some final remarks conclude the argumentation.

Models specification
The IAT accuracy and log-time responses can be modeled in a similar fashion by means of the Rasch model (Rasch, 1960) model and the log-normal model (van der Linden, 2006), respectively.
In the Rasch model, the probability of observing a correct response (i.e, the stimulus is sorted into the correct category) can be expressed as a function of the respondent's ability θ (i.e., the ability of the respondent to correctly categorize the stimuli) and the stimulus easiness b (i.e., the characteristics of the stimulus that make it more recognizable as a prototypical exemplar of its own category).The higher the value of θ, the higher the respondent's ability to perform the task, and, hence, the higher the proportion of stimuli correctly categorized.The higher the value of b, the easier the stimulus is, or, in other words, it is easily sorted in the category to which it belongs and hence can be considered as a prototypical exemplar of that category.The interplay between these two parameters determine the probability of a correct response.The estimates of the Rasch model parameters can be obtained by applying GLMMs to IAT accuracy responses.
In GLMMs, the natural link function (g) between the linear combination of predictors and the observed values y is the logit (McCullagh & Nelder, 1989).The inverse of the link function g (i.e., g −1 ) takes on a form that can be equated to the Rasch model (see De Boeck et al., 2011;Doran, Bates, Bliese, & Dowling, 2007;Gelman & Hill, 2007, for the mathematical proofs).
The log-normal model (van der Linden, 2006) allows for using the response time in their continuous nature by considering the normal distribution of the log-time responses.Consequently, the loss of information due to the discretization of the response times is avoided.
According to this model, the log-time response of a respondent can be expressed as a function of the respondent's speed τ (i.e., the speed of the respondent to categorize the stimuli) and the stimulus time intensity δ (i.e., the characteristics of the stimulus that make it require less time for getting a response).The lower the value of τ , the higher the respondents' speed.Likewise, the lower the value of δ, the lower the time the stimulus requires for getting a response.The time intensity parameter δ informs the representativeness of the stimulus of its own category.
The lower the time it needs to be categorized, the more representative of its own category it is.
The interplay between these two parameters determine the log-time responses.The estimates of the log-normal model parameters can be obtained by applying LMMs to the IAT log-response time.In LMMs, the link between the predictors and the observed variables is the identity link, according to which the same scale of the dependent variable is taken as the scale for the link function, that is, the normal distribution.The Best Linear Unbiased Predictors (BLUP) are used for obtaining the Rasch model and log-normal estimates from the fitted (G)LMMs (De Boeck et al., 2011;Doran et al., 2007).BLUPs are the conditional modes of each level of the random effect, and they are not parameters of the model per se.They express the deviation of each level of the random effect from the estimated fixed effect.When added to the fixed effect of the IAT associative conditions, they result in the condition-specific estimates of either each respondent parameters or the condition-specific estimates of each stimulus parameters.
In (G)LMMs, the effect of the IAT associative condition on respondents' performance can be investigated by specifying the between-conditions and within-respondents variability, or, in other words, by specifying the random slopes of the respondents in the associative conditions.This results in condition-specific respondents' parameters.By specifying the betweenconditions and within-stimuli variability (i.e., the random slopes of the stimuli in the associative conditions), it is possible to obtain condition-specific estimates of the stimuli parameters.
This allows for investigating the contribution of the stimuli to the IAT effect.
Three meaningful models for the analysis of the IAT accuracy responses were specified (left panel of Table 1), as well as three meaningful models for the analysis of the IAT logtime responses (right panel of Table 1).The random structure specified for the GLMMs and the LMMs were identical.The difference between the specification of the GLMMs and the LMMs lies in the assumption made on the distribution of the error term.In the former case, it is assumed to follow a logistic distribution (i.e., ε i ∼ L(0, σ 2 i ), where L denotes the logistic distribution as in Doran et al. (2007)), in the latter one it is assumed to follow a normal distribution (i.e., ε i ∼ N (0, σ 2 i )).The fixed intercept is set at 0, so that the estimated fixed effects of the IAT associative conditions represent the expected log-odds of a correct response or the average log-response time in each associative condition for the Rasch model and the log-normal, respectively.
Model 1 is considered as the Null Model.The random structure specification of Model 1 (i.e., stimuli random intercepts and respondents' random intercepts) results in the estimation of overall stimuli parameters and overall respondents' parameters.These parameters inform about the across conditions performance of the respondents and the across conditions functioning of the stimuli.This model should be preferred when a low between-conditions variability is observed at both respondents' and stimuli level.The lack of between-conditions variability already indicates that there is no IAT effect on either respondents' performance or stimuli characteristics.Since both respondents and stimuli are specified as random intercepts, their estimates are centered around 0.
The random structure specification of Model 2 (i.e., stimuli random slopes in the associative conditions and respondents' random intercept) results in the estimation of condition-specific stimuli parameters and overall respondents' parameters.This model results as the best fitting one when a high within-stimuli between-conditions variability is observed, along with a low within-respondents between-conditions variability.This model allows for testing whether the functioning of the stimuli differs between conditions.If a stimulus shows a higher b s (or δ s ) parameter in one condition than in the other, it means that it was easier (or required less time) to be categorized in the former condition rather than in the other.Moreover, the differential measure between the condition-specific stimuli estimates informs about the bias due to the associative conditions, which in turn provides information about the contribution of each stimulus to the IAT effect.Since the fixed intercept is set at 0 and respondents are specified as random intercepts, their estimates are centered around 0, that is, the mean of the distribution of respondents' estimates.
The random structure specification of Model 3 (i.e., respondents' random slopes in the asso-ciative conditions and stimuli random intercept) results in the estimation of condition-specific respondents' parameters and overall stimuli parameters.This model results as the best fitting one when a high within-respondents between-conditions variability is observed, along with a low within-stimuli between-conditions variability.The estimates of the condition-specific respondents' parameters, either θ p or τ p , express if and how accuracy or speed performance of each respondent was affected by the IAT associative condition.By computing the difference between respondents' condition-specific estimates, a measure of the bias due to the associative conditions can be obtained, allowing for testing whether there is an effect of the condition on respondents' performance.Since the fixed intercept is set at 0 and stimuli are specified as random intercepts, their estimates are centered around 0, that is, the mean of the distribution of the stimuli estimates.
Response times must be log-transformed for the application of the log-normal model and for obtaining its estimates.From now on, the models applied on IAT accuracy responses will be identified with the letter "A", while the models applied on IAT log-time responses will be identified with the letter "T".The R code used for estimating these models is reported in the Appendix.Outfit statistics were used to evaluate the fit of the data to the model chosen after model comparison.If Outfit statistics ranged between .50 to 2.00 (Linacre, 2002), they express a good fit of the data to the model.However, the most problematic ones are the Outfit statistics above 2, indicating a higher variability in the data that is not explained by the model (i.e., underfit).Outfit statistics below 0.50 indicates overfit (i.e., the data shows less variability than that expected by the model) and will not be considered as problematic as those indicating underfit.

Method
The above-mentioned models were applied to a Race IAT.ls were fitted with the lme4 package (Bates, Mächler, Bolker, & Walker, 2015) in R (Version 3.5.1,R Core Team, 2018) and the implicitMeasures package (Epifania, Anselmi, & Robusto, 2020c)  Participants.Sixty-five university students (F = 49.23%,Age = 24.95± 2.09 years) voluntarily took part in the study.Participants were informed about the confidentiality of the data and asked for their consent to take part in the study.Most of them (84.62%)identified themselves as belonging to the Mediterranean ethnic group.A sensitivity power analysis was run with GPower (Faul, Erdfelder, Buchner, & Lang, 2009) to understand whether the sample size allows for ensuring 80% power to detect an effect size f 2 of at least 0.15 at p < .05.The sensitivity power analysis was run specifically for the investigation of the relationship between the models parameters estimates and the IAT classic score, and pointed out that the sample size was adequate for the aim.
They were instructed to be as accurate and fast as they could.
Data cleaning and D score.Exclusion criteria based on both latency and accuracy responses were applied (Greenwald et al., 2003;Nosek, Banaji, & Greenwald, 2002).The algorithm D1 in Greenwald et al. (2003) was used for computing the D score.The difference was computed between the average response time in the BGWB and the WGBB condition: Positive scores stood for a possible preference for White people over Black people.In applying the LMMs, the latencies at the incorrect responses were used.

Results
No participants or trials were eliminated grounding on the response time exclusion criteria.
Three participants were excluded because of the accuracy deletion criterion (Nosek et al., 2002).The sample was finally composed by 62 participants (F = 48.39%,Age = 24.92± 2.11 Model A2 was chosen.This model provided overall participants ability parameters θ p and condition-specific stimuli easiness parameters (b WGBB and b BGWB ).Results from Model A2 indicated a higher probability of correct response in the WGBB condition (log-odds = 3.45, SE = 0.12) than in the BGWB condition (log-odds = 2.07, SE = 0.11)1 .Between-participants variability was 0.17.Between-stimuli variability in the WGBB condition (σ 2 = 0.08) was lower than that in the BGWB condition (σ 2 = 0.15).The correlation between stimuli variability in the two conditions was moderate (r = .34).
The Outfit statistics of the respondents ranged between 0.04 and 1.85 (M = 0.92 ± 0.33).
Seven respondents showed Outfit statistics below 0.50, and they were retained in the analysis.
All stimuli showed appropriate Outfit values in condition BGWB (M = 0.92 ± 0.12, Min = 0.69, Max = 1.08).Outfit statistics in condition WGBB (M = 0.94 ± 0.40, Min = 0.25, Max = 1.71) highlighted four stimuli with Outfit values below 0.50, but they were retained in the analysis.Stimuli easiness parameters for each condition resulting from Model A2 are reported in Table 2. Note: "wf": White female face, "wm": White male face, "bf": Black female face, "bm": Black male face; WGBB: White-Good/Black-Bad condition; BGWB: Black-Good/White-Bad condition.Rows are ordered by decreasing values of b WGBB − b WGBB .The units of parameter b are the log-odds.The units of parameter δ are the log-seconds.
Overall, IAT stimuli tended to be easy stimuli.Stimuli tended to be easier in the WGBB condition than in the BGWB condition, where they showed a higher easiness variability.On average, object stimuli in the WGBB condition were the easiest stimuli, while negative words stimuli tended to be the least easy stimuli in the BGWB condition, immediately followed by positive words in the same condition.The difference in stimuli easiness estimates is reported in Table 2 as well.Object stimuli showed the lowest average easiness difference, while attribute stimuli, particularly Good exemplars, showed the highest average difference between conditions.The difference in the easiness estimates between the two associative conditions allowed for the identification of the stimuli of each category that gave the highest contribution or the least contribution to the IAT effect.The stimuli giving the highest contribution to the IAT effect were joy and happiness (category Good), evil and horrible (category Bad), wm3 and wf3 (category White), and bm2 and bf2 (category Black).The stimuli giving the lowest contribution to the IAT effect were love and glory (category Good), annoying and pain (category Bad), wf1 and wm1 (category White) and bm3 and bf3 (category Black).
Log-normal models.The between-stimuli variability was particularly low (σ = 0.003), while the betweenparticipants variability was slightly higher in the BGWB condition (σ = 0.05) than that in the WGBB one (σ = 0.02).The correlation between respondents' variability in the two conditions was strong (r = .63).

Log
Stimuli time intensity parameters δ s are reported in Table 2.The stimuli time intensity estimates are obtained by adding each stimulus BLUP to the fixed intercept.Since the fixed intercept is set at 0, the time intensity estimates are centered around 0. The lower the value of δ s , the lower the amount of time the stimulus needs for getting a response.Attribute stimuli required more time to get a response than object stimuli.Black male faces required less time for getting a response than Black female faces.This pattern was not observed for White people faces.Three of the positive attribute stimuli (pleasure, glory, laughter) showed time intensity estimates higher than the estimates of the stimuli belonging to the same category.Also three negative attributes (failure, annoying, pain) showed a higher time intensity estimates than the other negative attributes.Object stimuli tended to have similar time intensity estimates.

Relationship between model estimates and typical D score
A speed-differential measure was computed as the difference between speed parameters in the BGWB condition and the WGBB condition.Negative values indicated a respondent faster in the BGWB condition than in the WGBB condition.Pearson's correlations were computed between participants' ability, condition-specific speed parameters and speed-differential.Partici-pants' ability poorly and positively correlated with speed in the BGWB condition (r = .13,p = .32),and it correlated negatively and poorly with the speed-differential (r = −.14, p = .28),although these correlations were not significant.Ability moderately correlated with the speed estimate in the WGBB condition (r = .32,p = .01).Participants' ability and speeddifferential were regressed on the D score.Backward deletion was used to investigate the linear combination of predictors accounting for the higher proportion of explained variance.Backward deletion kept both the predictors in the model, which accounted for about 70% of the total variance (Adjusted R 2 = .78,F (2, 59) = 106.3,p < .001).Speed-differential strongly and positively predicted D score (β = 1.93, t(59) = 13.88,p < .001).Ability negatively predicted the D score (β = −0.18,t(59) = −2.48,p = .012).
To better understand the specific contribution of the speed of each associative condition, a model including the linear combination of the ability estimate, the speed estimate in the WGBB condition, and the speed estimate in the BGWB condition was specified as well.Backward deletion kept all predictors in the model, which accounted for almost the 80% of the total variance (Adjusted R 2 = .79,F(3, 58) = 76.46,p < .001).The speed estimate in the WGBB condition negatively predicted the D score(B = -2.22,t(58) = -11.43,p < .001),while the speed in the BGWB condition positively predicted it (B = 1.92, t(58) = 14.16, p < .001).

Final remarks
The application of the (G)LMMs to IAT data proved to be an effective modeling framework for obtaining the estimates of the Rasch model and the log-normal model parameters while accounting for the non-independence of the IAT observations.
The fine-grained analysis at the stimuli level allowed for a deeper understanding of the meaning of the IAT measure, for example by giving the chance of investigating the stimuli that were not representative of their category or did not contribute to the IAT effect.Specifically, these models provided detailed information about how much each stimulus is representative of its own category.According to Nosek et al. (2005), a valid IAT measure can be obtained by using as few as two stimuli to represent each category.The information at the stimuli level provided by these models allows for exploiting the most representative and prototypical exemplars of each category.For instance, it was possible to identify two stimuli for each category providing the highest information (e.g., the words joy and happiness for the category Good).
Grounding on these results, it is possible to design new IATs that can maximize the information, while reducing the number of stimuli representing each category and, consequently, the number of trials.However, the estimates provided by the Rasch model and the log-normal model were not considered together.As such, the information they are providing should be interpreted with caution.This issue can be addressed by using a hierarchical approach like the one in van der Linden (2007).
The representativeness of the stimuli can be pretested in a sample drawn from the population of interest.Even though this procedure is a valid procedure, it should be repeated every time the IAT is used on samples drawn from different populations.One of the advantages of Rasch modeling is that the estimates obtained on the stimuli are independent from the sample from which they were estimated.As such, stimuli parameters estimates can provide information on stimuli functioning that can be generalized to other samples (drawn from the same population) than the one from which they were obtained.Besides, by using this approach, it is possible to add new stimuli and test their functioning independently from the functioning of the old stimuli.
The information at the stimuli level can also be used for understanding the associations mostly driving the IAT effect.In this case, the evaluative dimensions Good and Bad were the stimuli categories showing the highest difference between the associative conditions.Both stimuli categories resulted easier in the WGBB condition than in the BGWB condition, meaning that the Good stimuli were more easily sorted when their category shared the response key with White category than when it shared the response key with Black category.Similarly, Bad stimuli were more easily sorted when their category shared the response key with Black category than when it shared the response key with White category.This result is in line with the positive primacy effect found by (Anselmi et al., 2011), and it also highlights the contribution of the negative evaluative dimension in influencing the IAT effect.Given that the IAT effect appears to be mostly driven by evaluative dimensions, this result is in contrast with what has been found by Klauer et al. (2007), according to whom attitudes influence the performance at the IAT through the categorization of the object stimuli.
These models also resulted in detailed information on respondents' accuracy and speed performance.Understating how respondents are behaving during the IAT administration is crucial for getting a deeper comprehension of its measure and on the factors that might influence it.Respondents' accuracy performance was not affected by the IAT associative conditions, while their speed performance was.Consequently, the IAT effect seems to be mostly due to a respondents' slowdown, while the accuracy performance remains unaltered.This result can be interpreted by considering the speed-accuracy trade-off (Klauer et al., 2007).Indeed, respondents tend to slow down to maintain the accuracy unaltered in the condition that is against their automatically activated associations.
Not surprisingly, the D score was strongly related with the speed parameters, both speeddifferential and condition-specific speed estimates, while the contribution of ability was negli-gible.By using a differential measure to predict the D score, it is not possible to understand the actual weight of each associative condition in determining the final score.Conversely, when the condition-specific estimates were used to predict the D score, it was possible to isolate and highlight the higher contribution of the speed estimate pertaining to the WGBB condition compared with those pertaining to BGWB condition.This result is consistent with those obtained from the stimuli easiness estimates.
Given their flexibility, these models can be used for modeling data from other implicit measures similar to the IAT, such as the Single Category IAT (SC-IAT; Karpinski & Steinman, 2006) or the Go/No-Go Association Task (GNAT; Nosek & Banaji, 2001).The SC-IAT results from a slight modification of the IAT procedure and is based on speed and accuracy of stimuli categorization.Consequently, both the accuracy and the log-normal models can be used for modeling its responses.Differently, the GNAT is solely based on accuracy responses.Given that the accuracy and the log-time models do not rely on each other to be applied, it is possible to use only the accuracy models for obtaining the estimates of the Rasch model parameters on the GNAT accuracy responses.Moreover, the IAT can be used together with either the SC-IAT (Epifania, Anselmi, & Robusto, 2020b;Karpinski & Steinman, 2006).This modeling approach allows for extending the random structure if the (G)LMMs to include multiple implicit measures into one comprehensive model.
Since the aim of the study was to investigate the effect of the IAT associative condition on respondents' performance or stimuli functioning within a Rasch approach, no other predictors were entered in the models.However, given the flexibility of these models, it is possible to include other fixed effects for the investigation of the effect of different features of the stimuli (e.g., whether it is a word or an image) or of different characteristics of the respondents.
In this study, we did not investigate and compare the relationship between explicit measures of attitudes, behavioral outcomes, estimates obtained through Rasch and log-normal models, and D score.It can be speculated that, since the estimates obtained from the (G)LMMs are not influenced by unwanted error variance due to the non-independence of the observations, they can be more reliable than the D score, hence allowing for a better inference of the construct under investigation.Therefore, they may result in a better prediction of behavioral outcomes, as well as showing stronger relations with explicit evaluations tapping the same construct.Future studies should address this issue.
Rasch analysis based on small samples, such as that used in this study, should be used for exploratory purposes with extreme caution (Chen et al., 2014).Nonetheless, when LMMs are employed, it is not the sample size per se that matters, but the number of observations for each unit of analysis, in this case, the respondents.There were 120 observations for each respondent, which should have ensured reliable estimates for the respondents.
This work highlighted how a simple approach can lead to a thorough and detailed analysis of the IAT data within a Rasch framework.The fine-grained analysis at the stimuli, participants, and associative condition levels provided by these models may lead to new interesting insights on the IAT functioning and meaning.
For both accuracy and log-time responses, in Model 2 (Table 1) of the estimates of the stimuli are centered at 0 (argument (1|stimuli)), while in Model 3 (Table 1) respondents estimates are centered at 0 (argument (1|subject)).In Model 1, the Null model, both stimuli and respondents are centered around 0.
The Rasch and log-normal estimates were obtained by means of the lme4 package (Bates et al., 2015) in R. The lme4 package can be installed and loaded with the following code: install.packages("lme4") # install package library(lme4) # upload the package for the estimation of # the models

A Accuracy models specification
The code for the specification of the accuracy models is illustrated.
The between-stimuli variability is specified as random intercepts (i.e., (1|stimuli)) as well.Model 2: The between-subjects variability is specified as random intercepts centered around 0 (i.e., (1|subject)).The within-stimuli between-conditions variability is specified as the random slopes of the stimuli in the conditions (i.e., (0 + condition|stimuli)).Model 3: The between-stimuli variability is specified as random intercepts, centered around 0 (i.e., (1|stimuli)).The within-subjects between-conditions variability is specified as the random slopes of the respondents in the conditions (i.e., (0 + condition|subject)).

A.1.1 Model comparison
Once the three models have been estimated, they can be compared with each other.
anova(a1, a2, a3) Since Model a2 and Model a3 have the same degrees of freedom, the χ 2 statistics obtained form their comparison is meaningless and cannot be used as a means for choosing the best fitting model.Comparative fit indexes should be used instead.The use of function anova() is just for the convenience of having all models comparative fit indexes, deviance, log-likelihood and degree of freedom on the same page.

A.2 Rasch model parameters
Grounding on the results of model comparison, the best fitting model can be selected for extracting the estimates of the Rasch model parameters.
Model 1 results in overall respondents' parameters and overall stimuli parameters.Respondents overall ability parameters can be extracted and stored in a data frame: Model 3 can be estimated as follows: For the comparison of the log-time models, the same code as the one used for the comparison of the accuracy models can be used.The names of the models have to be changed accordingly, in this case from a to t.

B.1 Log-normal model parameters
We report the code for extracting the log-normal model parameters for log-time Model 3, assuming it was the best fitting model according to model comparison.The same code used for extracting the parameters for the accuracy models can be used for extracting the parameters of the log-normal models.The changes regard the name of the objects containing the models, from a to t, and the names of the new objects created for the parameters (e.g., from easiness to intensity).
Respondents' condition-specific parameters can be obtained as follows: was used for computing the IAT D score.A free and user-friendly tool for computing the IAT D score is retrievable at http://fisppa.psy.unipd.it/DscoreApp/(Epifania, Anselmi, & Robusto, 2020a).

Table 1 :
Rasch model and log-normal model estimates.