The embodiment of emotional words in a second language: An eye-movement study

The hypothesis that word representations are emotionally impoverished in a second language (L2) has variable support. However, this hypothesis has only been tested using tasks that present words in isolation or that require laboratory-specific decisions. Here, we recorded eye movements for 34 bilinguals who read sentences in their L2 with no goal other than comprehension, and compared them to 43 first language readers taken from our prior study. Positive words were read more quickly than neutral words in the L2 across first-pass reading time measures. However, this emotional advantage was absent for negative words for the earliest measures. Moreover, negative words but not positive words were influenced by concreteness, frequency and L2 proficiency in a manner similar to neutral words. Taken together, the findings suggest that only negative words are at risk of emotional disembodiment during L2 reading, perhaps because a positivity bias in L2 experiences ensures that positive words are emotionally grounded.

Emotion is thought to play a foundational role in grounding semantic representations during first language (L1) processing (Kousta, Vigliocco, Vinson, Andrews, & Del Campo, 2011). However, the role of emotion in grounding second language (L2) semantic representations remains an open question: Some studies find that bilinguals process emotional and neutral words (e.g., sex vs. pin, respectively) differently in their L2, similar to what native speakers do in their L1 (e.g., Sutton, Altarriba, Gianico, & Basnight-Brown, 2007), whereas other studies do not (e.g., Degner, Doycheva, & Wentura, 2012). Such discrepancies may arise for a variety of potential reasons, which include whether bilinguals process emotional words differently as a function of L2 proficiency, emotional polarity or comprehension demands. Here, we test these alternatives, with the ultimate goal of determining whether an embodied theoretical approach to language (e.g., Barsalou, 1999) can explain when bilinguals process L2 emotional words like L1 emotional words. We first review the bilingual literature on L2 emotional word processing, and then describe the embodied approach that comprises the theoretical basis of this study.
One source of complexity regarding L2 emotional word processing concerns L2 proficiency. For example, bilinguals who are as proficient in their L2 as their L1 show intact emotional word processing in emotional Stroop tasks (Eilola, Havelka, & Sharma, 2007;Sutton et al., 2007). Conversely, bilinguals who are less proficient in their L2 than their L1 show reduced or no emotional effects in other tasks. For example, bilinguals show reduced skin conductance responses to L2 childhood reprimands and taboo words in emotion rating tasks (Harris, Ayçiçeği, & Gleason, 2003; see also, Segalowitz, Trofimovich, Gatbonton, & Sokolovskaya, 2008;Degner et al., 2012, for implicit affect association and affective priming tasks, respectively). The fact that low L2 proficient bilinguals show impaired emotional processing effects implies that bilinguals require additional, and likely direct experience with L2 words in order to treat them emotionally. It also implies that automatic cross-language activation that routinely occurs during bilingual language processing, which is highly probable for low L2 proficient bilinguals (e.g., Thierry & Wu, 2007; reviewed in Baum & Titone, 2014;Kroll, Bobb, Misra, & Guo, 2008), is insufficient for leading to emotional word processing effects in the L2.
Another source of complexity with respect to L2 emotional word processing is the potential effect of emotional polarity (i.e., negative vs. positive emotionality). Most work focuses exclusively on negative words-thus, few have compared negative and positive words to each other, while simultaneously taking into account the other linguistic ways that words vary (e.g., frequency). A recent exception is Conrad, Recio, and Jacobs (2011), who compared emotional and non-emotional words using event-related potentials (ERP) gathered during lexical decisions in two groups of bilinguals differing in L2 proficiency. Here, more proficient bilinguals showed an enhanced early posterior negativity and late positive complex in L1 for negative and positive words vs. neutral words, with delayed but similar effects in the L2. Less proficient bilinguals, however, showed ERP modulations in the L2 for positive but not negative words. Thus, negative but not positive emotional words may be treated in an unemotional manner in the L2. In contrast with this work, Degner et al. (2012) and Segalowitz et al. (2008) found impoverished emotional word processing in an implicit affect association task and an affective priming task, respectively, which did not interact with continuous measures of L2 proficiency. Interestingly, however, Degner et al. found that bilinguals who reported frequently using their L2 processed emotional words in their L2 more like they did in their L1, even after controlling for L2 proficiency.
Thus, an open question concerns the amount and kind of L2 experience required for intact emotional word processing. One potentially useful approach for addressing this question is to view emotional effects on word processing as part of a larger class of language embodiment phenomena, which considers how real-world experiences ground semantic representations generally. Consider the words apple and grief, for which people have very different bodily experiences. Apples are tangible objects that may be grasped, eaten or physically manipulated in any number of ways (e.g., sliced, cooked, stewed, etc.), which are all typically classed as sensorimotor experiences. Grief is a less tangible concept that may have indirect physical manifestations that we typically class as emotional experiences (e.g., a lump in one's throat). In embodied approaches to language, words are grounded in whichever type of experience they tend to co-occur with during language use (Zwaan, 2008). Vigliocco and colleagues recently proposed that all words are represented by linguistic information (e.g., word associations), emotional information and sensorimotor information (Kousta et al., 2011). They further stipulated that emotional and sensorimotor information primarily represent abstract and concrete semantics, respectively.
With respect to L2 word processing in bilinguals, an embodied approach to language forces us to consider more deeply the behavioural ecology of bilinguals (Green, 2011), and whether this ecology would afford the embodiment of emotional words in one's L2. Accordingly, bilinguals' L2 words may be "disembodied" (Pavlenko, 2012) if they do not use their L2 in social contexts that provide opportunities for co-occurrence with emotional experiences. This idea is consistent with work on language choice. For example, bilinguals curse to express anger using their L1 more than their L2 (Dewaele, 2004), and switch to their L1 for emotionally charged interactions with their romantic partners, even when their partners have limited knowledge of the language (Pavlenko, 2005). Thus, bilinguals may experientially lock their L2 words out of emotional contexts, thus rendering them less salient than L1 words during on-going language processing. Bilinguals might also show differences between negative and positive words to the extent that they experientially lock their L2 out of negative but not positive emotional contexts. For example, Conrad et al. (2011) suggested that bilinguals showed no emotional effects for negative words but not positive words because a positivity bias ensures that bilinguals use their L2 in emotionally positive contexts. The notion of a positivity bias is supported by work on the Pollyanna hypothesis showing that human communication is centred on emotionally positive exchanges (Boucher & Osgood, 1969). It is also supported by work on emotion regulation showing that positive emotion is more often up-regulated and less often downregulated than negative emotion (Matsumoto, Yoo, Hirayama, & Petrova, 2005), particularly with respect to interactions with colleagues and strangers vs. family and friends. An embodied perspective also forces us to consider differences between emotional and sensorimotor embodiment in bilinguals-which are reflected in emotional and concreteness advantages, respectively (Kousta et al., 2011). Accordingly, bilinguals may have difficulty grounding L2 words in emotional experiences specifically but not sensorimotor experiences more generally. To test this idea, we can capitalise on interactions found in the L1 literature between word emotionality and concreteness, which shows that emotion is more likely to facilitate semantic categorisation accuracy for abstract but not concrete words (Newcombe, Campbell, Siakaluk, & Pexman, 2012). Similarly, in eye-movement paradigms, emotion facilitates first-pass reading times for abstract words and not concrete words, and conversely, concreteness facilitates neutral words but not emotional words (Sheikh & Titone, 2013). Thus, if bilinguals have disembodied negative L2 words, those negative words should be facilitated by concreteness, like neutral words, and not by emotionality.
Differences among bilinguals in L2 proficiency may not have predicted emotional effects in previous studies because proficiency does not reflect the kinds of contexts in which bilinguals use their L2. However, L2 proficiency might modulate facilitation by concreteness to the extent that sensorimotor referents of words are independent of context. For example, there is no evidence suggesting that sensorimotor experiences vary as a function of social context, as has been shown for emotional experiences (Matsumoto et al., 2005). Thus, L2 proficiency should modulate concreteness advantages for negative and neutral words. Moreover, L2 proficiency should modulate word frequency effects, which reflect how often words occur in language, to the extent that frequency effects also do not depend on context-specific L2 experiences. Consistent with this conjecture, differences in language use measures among bilinguals predict L2 and L1 frequency effects (e.g., Whitford & Titone, 2012) even though the usage measures do not reflect the context of use. Bilinguals generally also show larger L2 than L1 frequency effects, presumably because bilinguals experience words less frequently in their L2 (e.g., Whitford & Titone, 2012). Thus, bilinguals should show less dependence on word frequency for modulating embodiment effects, in contrast to native speakers (Juhasz, Yap, Dicke, Taylor, & Gullick, 2011;Sheikh & Titone, 2013).

The present study
The work just reviewed leads to the following open questions about bilinguals reading in their L2: (1) Do bilinguals show reduced or eliminated emotional facilitation for negative words, but not positive words, relative to neutral words? (2) Do bilinguals show facilitation of negative words by concreteness, like neutral words? (3) Do differences in L2 proficiency among bilinguals predict facilitation by concreteness and frequency, but not emotionality? We address these questions using eye-movement measures of natural sentence reading (Rayner, 2009). We used semantically neutral sentences in which we embedded English target words that varied on emotional valence, frequency and concreteness.
French-English bilinguals (all French L1) read English sentences presented in their entirety, with no goal other than comprehension. Since embodiment theories are based mostly on data from single word paradigms, they do not help target particular reading time measures for the disembodiment predictions. Thus, we analysed all first-pass fixation time measures from first fixation duration (FFD) to go-past time (GPT) to identify the point in time at which emotional disembodiment manifests in bilinguals for negative words. To obtain more evidence that our bilinguals have disembodied negative words but not positive words, we compared the bilinguals with native English speakers from a previous study that used the same materials (Sheikh & Titone, 2013).

METHOD Participants
We recruited 34 bilinguals (French L1, English L2, mean age = 24.97, standard deviation [SD] = 5.16) at McGill University that were less proficient in their L2 than their L1 and primarily used their L2 in formal environments rather than informal social environments (see supplementary materials). We report below how we determined sample size, all data exclusions, manipulations and measures in the study.

Materials and design
We used 156 target words grouped into 52 triplets, each consisting of a negative, neutral and positive word. Frequency and concreteness were manipulated, but balanced across emotional categories (Table 1). Length was longer for lowcompared with high-frequency words, which was statistically controlled in all analyses. We created three sentences for each triplet which were wellformed when combined with any of the triplet members. Differences in sentential fit across emotional categories were avoided, and any potential effects were statistically controlled (see Sheikh & Titone, 2013, where we used the same stimuli and describe their development in depth). Targets and sentence frames were presented once in a given experimental list and the combinations were counterbalanced across participants (Table 2).
A modified Language Experience and Proficiency Questionnaire (LEAP-Q; Marian, Blumenfeld, & Kaushanskaya, 2007) was also administered. We used the LEAP-Q to verify that all participants were native speakers of French and dominant in their native language, that English was their L2, and to measure L2 proficiency using a 7-point scale that ranged from 1 (beginner) to 7 (near-native).

Apparatus and procedure
Eye movements were recorded using an Eyelink 1000 that sampled eye position every millisecond. The right eye was recorded but viewing was The English Lexicon Project (Balota et al., 2007); b Kousta, Vinson, and Vigliocco (2009); c MRC Psycholinguistic Database (Coltheart, 1981

RESULTS
We tested our hypotheses using first-pass fixation duration measures, which reflect early stages of lexical processing (rather than later measures that reflect the integration of word meaning into the sentential context) (Rayner, 2009). We analysed FFD (the duration of the first fixation on a word), single fixation duration (SFD; fixation time in cases where the word was fixated exactly once), gaze duration (GD; the sum of the durations of all fixations made during the first pass, before the eyes left the target region) and GPT (the sum of the durations of all fixations on the word from the point when the word is first fixated up until the eyes move past the word to the right). We also analysed the probability of fixating and regressing to a target, which are presented as supplementary material.
The eye-movement measures were analysed in R (R Development Core Team, 2010) with linear mixed models (LMMs), and generalised LMMs for the binary data, using the lme4 package (Bates, Maechler, & Dai, 2009). To test our hypotheses regarding negative and positive valence, we specified predictors for valence (negative vs. neutral vs. positive), frequency (continuous), concreteness (continuous), L2 proficiency (continuous) and their interactions in the fixed-effect structure. The model baseline was set to neutral valence because our hypotheses specifically concern differences between negative and positive words vs. neutral words. All continuous predictors were standardised. We also included a covariate for word length (continuous) in all analyses.
We maximised the random-effect structure to the extent possible (Barr, Levy, Scheepers, & Tily, 2013). We included random intercepts for participant (subject), word and triplet; and by-subject random slopes for valence (including the slopeintercept correlation) and frequency (including the slope-intercept correlation for SFD, but not the other measures).
We also used likelihood ratio tests to confirm that the highest-order significant effect in each model was justified to ensure that the data were not overfitted, though the full models are presented in Table 4 for the sake of comparison. The models were fit to log-transformed observations to meet model assumptions, and non-log predicted values were calculated for plotting partial effects. We report the estimated coefficient (b), standard Table 2. Target words (between asterisks) were presented within one of three sentence frames in a given list, which were rotated around their triplet's negative, neutral and positive target words across lists so that targets were fully crossed with sentence frames Counterbalance Sentence

List 1
The art teacher presented the *smoke* that the students were going to paint. The news report mentioned the *hill* that was discussed at work. The school paper reviewed the *drink* that had caused concern among students. List 2 The art teacher presented the *drink* that the students were going to paint. The news report mentioned the *smoke* that was discussed at work. The school paper reviewed the *hill* that had caused concern among students. List 3 The art teacher presented the *hill* that the students were going to paint. The news report mentioned the *drink* that was discussed at work. The school paper reviewed the *smoke* that had caused concern among students.
Note: Targets were not presented with asterisks during the experiment.  Baayen (2008), since we have well over the minimum 100 observations that he indicates should suffice for calculating p values in this manner. As seen below, there was no discrepancy between these p values and the |t| ≥ 1.96 criterion. Means and SDs for the eye-movement measures are presented in Table 3. Mean accuracy on the comprehension questions was 81.83% (SD = 4.85). Twenty-six per cent of the trials were discarded because of track losses, lack of fixation on the target words, blinks, or when observations were shorter than 80 ms. Visually identified outliers were eliminated from the remaining trials Setting aside the effect of L2 proficiency, we can see in the LMM outputs ( Table 4) that negative and neutral words did not differ for FFD, SFD or GD. However, the GPT model showed a single main effect for negative valence, suggesting that negative words (with sufficient processing time) eventually became faster than neutral words. In contrast to negative words, significant main effects for positive valence across all measures indicate that positive words were faster than neutral words. Moreover, in the FFD data, there was an unexpected two-way interaction between positive valence and frequency indicating that the emotional advantage for positive words was even larger for high-frequency words compared with low-frequency words. Additional models with negative valence as the baseline (as opposed to neutral described above) showed that positive words were faster than negative words across all measures (ps < .05). Thus, the emotional advantage is reduced for negative words but not positive words, and emotional advantages occur for both high-frequency and low-frequency words.
Regarding the effect of L2 proficiency, there were no interactions for FFD or SFD. However, there were L2 proficiency interactions for GD and GPT. In both cases, this manifested as a four-way interaction between positive valence, frequency, concreteness and proficiency. L2 proficiency never interacted with negative valence. Thus, the effect of proficiency on GD and GPT was identical for negative and neutral words, which manifested in  both cases as a three-way interaction between frequency, concreteness and proficiency. We visualised the pattern of effects from the model for GD using the coefficients for these significant interactions in a partial effects plot (Figure 1). We fit separate models to the GD data, split by median concreteness, to follow-up on the fourway interaction in Figure 1, and to test our hypothesis that L2 proficiency predicts the concreteness advantage but not the emotional advantage. For words low in concreteness (i.e., abstract words), there was a main effect for positive valence (b = −0.1, SE = 0.03, t = −3.56, p < .001) and no interactions, which indicates that for abstract words, processing was faster for positive compared with neutral words irrespective of frequency and proficiency. This emotional advantage for abstract positive words can be seen in all of the panels in Figure 1 at low concreteness values.
For words high in concreteness (i.e., concrete words), there was an interaction between positive valence, frequency and proficiency (b = −0.08,

Negative
Neutral Positive  Figure 1. Frequency, concreteness and proficiency produced identical effects for negative and neutral words, which differed from positive words. SE = 0.03, t = −2.86, p < .01). This interaction indicates that frequency and proficiency produced identical effects for concrete negative and neutral words, but different effects for positive words. Specifically, concreteness facilitated processing for negative and neutral low-frequency words, but only at high levels of proficiency. And as words became more concrete, this reduced the emotional advantage for positive words. This pattern can be seen in Figure 1 in the bottom right panel at high concreteness values. We also found this pattern of effects in separate models fit to the GPT data split by concreteness. Thus, the results show that concreteness advantages, but not emotional advantages, depend on proficiency, and that concreteness produces identical effects for negative and neutral words, but different effects for positive words.

L2 readers vs. L1 readers
Next, we test differences between the bilinguals reading in their L2 (L2 readers) in this study and 43 native speakers of English 1 (L1 readers) that read the same materials in a previous study (Sheikh & Titone, 2013). To compare L2 vs. L1 readers, we tested whether language group interacts with valence, frequency and concreteness. For a complete exposition of the L1 data, see Sheikh and Titone (2013). The most relevant finding in the previous study is that the concreteness advantage was limited to low-frequency neutral words in L1 readers, producing three-way interactions between valence, frequency and concreteness for negative and positive words for the first-pass fixation time measures. Thus, if emotional disembodiment is specific to negative words in L2 readers, a comparison with L1 readers should produce four-way interactions between language group, valence, frequency and concreteness for negative words, but not positive words. As expected, this four-way interaction was significant for negative words for FFD (b = 0.04, SE = 0.02, t = 2.09, p < .05), SFD (b = 0.04, SE = 0.02, t = 1.93, p = .05), GD (b = 0.09, SE = 0.02, t = 4.01, p < .001) and GPT (b = 0.06, SE = 0.02, t = 2.53, p < .05). There were no four-way interactions for positive words (ps > .12).

DISCUSSION
The purpose of the study was to investigate whether bilingual readers exhibit L2 emotional word processing effects, guided by predictions from an embodied approach to L2 word representation. We found that bilinguals reading in their L2 showed word processing facilitation by embodied knowledge only for some types of words, presumably because they capitalise on some but not all sources of experiential information to ground L2 semantics. Specifically, bilinguals processed positive words faster than neutral words, suggesting that they capitalise on emotionally positive experiences. Moreover, bilinguals with high L2 proficiency processed negative and neutral words faster when they were concrete compared to abstract, suggesting they also capitalise on sensorimotor experiences. However, bilinguals processed negative words faster than neutral words only for the latest first-pass measure, suggesting that they do not as readily capitalise on emotionally negative experiences.
Previous work on L1 embodiment indicates that a concreteness advantage, where observed, is diagnostic of emotional neutrality because it does not occur for emotionally charged words (Sheikh & Titone, 2013). Thus, the concreteness advantage for negative words, coupled with the absence of an early emotional advantage for negative words suggests that only negative words were emotionally disembodied in our bilingual participants. The finding that negative words alone are emotionally disembodied is consistent with research showing that bilinguals prefer to use their L2 less often than their L1 in emotional contexts (Dewaele, 2004;Pavlenko, 2005). An L1 preference for emotional contexts would reduce co-occurrence between words and emotional experiences, leaving words emotionally disembodied in the L2. We see evidence of this in our data given that increased L2 proficiency among our bilingual participants correlated with increased L2 use in formal contexts like the work place but not informal social exchanges (detailed in supplementary material). However, the same is not true of positive words, possibly because a positivity bias ensures that L2 use co-occurs with emotionally positive experiences. We also found that the emotional advantage for positive words was larger for high-frequency items compared with low-frequency items for first fixation duration only, which was unexpected, suggesting that bilinguals more easily retrieve positive emotional semantics for high-frequency words than low-frequency words at the earliest stage of processing, though this needs to be confirmed in future work.
The selective disembodiment for negative words observed here is consistent with Conrad et al. (2011), who found that bilinguals who were less proficient in their L2 than L1 showed emotional effects only for positive and not negative words. Crucially, we extend their findings by showing that this valence asymmetry occurs during natural reading, at the earliest stages of comprehension. Bilinguals are able to eventually compute the negative valence of disembodied words, which emerged as a relatively late emotional advantage for negative words. The late advantage is consistent with Harris et al. (2003) who observed attenuated skin conductance responses for taboo words and childhood reprimands even though bilinguals were ultimately able to identify their negative valence on a rating task. Interestingly, bilinguals continue to process negative and neutral words similarly in terms of how negative and neutral words are influenced by frequency, concreteness and proficiency, even after the emergence of the late emotional advantage. Thus, the late emotional advantage for negative words and the emotional advantage for positive words do not differ solely in terms of time course.
The embodiment approach also suggests a potential explanation for why studies vary in whether reduced or eliminated emotional effects are observed during L2 language processing. When the L2 is at least equal in proficiency to the L1, there is no L1 preference that locks the L2 out from emotional contexts, as that preference is partly driven by greater L1 proficiency (Dewaele, 2004). Thus, bilinguals can ground words in emotional experiences (Eilola et al., 2007;Sutton et al., 2007). In contrast, when bilinguals are less proficient in their L2, the L2 gets locked out of those contexts and bilinguals end up with emotionally disembodied words (Harris et al., 2003;Segalowitz et al., 2008). Of course, bilinguals could acquire words in emotional contexts if forced into emotional contexts irrespective of proficiency-the present results do not preclude that possibility, and indeed, suggest a future test of the embodiment hypothesis.
The findings also clarify the role of L2 proficiency. Specifically, L2 proficiency predicted concreteness advantages but not emotional advantages presumably because sensorimotor experiences are not context-specific the way emotional experiences are (Matsumoto et al., 2005). Moreover, concreteness advantages were limited to low-frequency words at high levels of L2 proficiency, similar to L1 readers (Sheikh & Titone, 2013). Thus, the bilingual experiences that underlie word processing facilitation by frequency and concreteness seem less dependent on context than the experiences that underlie facilitation by emotion. Interestingly, facilitation by concreteness at high levels of L2 proficiency did not occur for first fixation duration or single fixation duration. Thus, the representational changes correlated with L2 proficiency do not appear to be activated on the very first fixation on target words.
The present findings also add to the small number of studies for emotional target words embedded in sentences using eye-movement measures of reading, though these previous studies all examined L1 processing. Some work on L1 processing using isolated words found that valence had a monotonic effect on word recognition (i.e., negative words were slower than neutral words, which were slower than positive words; Kuperman, Estes, Brysbaert, & Warriner, 2014). However, other studies found that people process negative and positive words faster than neutral words (e.g., Vinson, Ponari, & Vigliocco, 2014), which converges with recent eye-movement results for lowfrequency words (Scott, O'Donnell, & Sereno, 2012;Sheikh & Titone, 2013). There is also an earlier eye-movement study by Hyönä and Häikiö (2005), but emotionally charged words in that study were only presented parafoveally and never directly fixated by readers, as in our study. This methodological difference makes it difficult to directly compare the results across the studies. In contrast, Scott et al. (2012) used a natural sentence reading paradigm, and the present findings converge with their results for high-frequency words. Scott et al. found that for low-frequency words, people processed negative and positive words (which did not differ) faster than neutral words. For highfrequency words, positive words were faster than neutral words, but negative words did not differ from neutral words. This is precisely the pattern that we observed here for bilinguals. One possible explanation that Scott et al. (2012) posited to explain their findings for high-frequency items involves the selective attenuation of emotional charge for negative words. Specifically, they proposed that negative emotionality may be reduced by high frequency of exposure, which they compared to desensitisation in psychotherapy. Thus, although the specifics differ, our interpretation converges with their suggestion in terms of negative words not being as emotional as positive words.
To conclude, our findings demonstrate that bilinguals have emotionally disembodied negative words during L2 reading, and that these words are instead grounded in sensorimotor experiences like neutral words. Our study also shows that L2 proficiency predicts concreteness advantages but not emotional advantages during natural reading. Thus, sensorimotor experiences are more readily available than emotionally negative experiences for grounding L2 words. Similarly, emotionally positive experiences are more readily available for grounding L2 words than emotionally negative experiences.