Higher judgements of learning for emotional words: processing fluency or memory beliefs?

ABSTRACT Previous research has shown that emotionally-valenced words are given higher judgements of learning (JOLs) than are neutral words. The current study examined potential explanations for this emotional salience effect on JOLs. Experiment 1 replicated the basic emotionality/JOL effect. In Experiments 2A and 2B, we used pre-study JOLs and assessed memory beliefs qualitatively, finding that, on average, participants believed that positive and negative words were more memorable than neutral words. Experiment 3 utilised a lexical decision task, resulting in lower reaction times (RTs) for positive words than for neutral words, but equivalent RTs for negative and neutral words, suggesting that processing fluency may partially account for higher JOLs for positive words, but not for negative words. Finally, we conducted a series of moderation analyses in Experiment 4 which assessed the relative contributions of fluency and beliefs to JOLs by measuring both factors in the same participants, showing that RTs made no significant contribution to JOLs for either positive or negative words. Our findings suggest that although positive words may be more fluently processed than neutral words, memory beliefs are the primary factor underlying higher JOLs for both positive and negative words.

People make decisions about their memory all the time. Will I remember that password, or should I write it down? Should I set a reminder for my appointment? Should I restudy the information from the last lecture I attended? In each case, people must judge how likely they are to remember information in the future. However, memory does not occur in an emotional vacuum. Information ranges in how positive or negative it is. Consequently, it is important to examine how the emotional valence of a piece of information influences people's beliefs about their likelihood of remembering it. People do judge emotional information as more likely to be remembered, but why? Do people hold specific beliefs about memory and emotion (e.g. "Surely, I won't forget to tell Janet the exciting news!"), or does emotion influence other cues thought to impact memory judgements (e.g. perceptual or conceptual fluency)? In a series of studies, this paper examines the relationship between emotions, memory, beliefs, and fluency-in this case, conceptual fluency-on memory judgements.
Although there are a variety of judgements that people make concerning their memory, our focus in the current study is on Judgements of Learning (JOLs). These judgements have been studied extensively for nearly forty years (see Dunlosky & Metcalfe, 2009 for a review) and have been shown to rely on a number of factors. According to Koriat's (1997) cueutilization account, metacognitive judgements such as JOLs are inferential in nature, in which people use a variety of cues to infer the likelihood that an item will be remembered later. Koriat proposed three general types of cues that underlie JOLs: intrinsic, extrinsic, and mnemonic. Intrinsic cues refer to material characteristics that provide information regarding the inherent ease or difficulty of learning that material, such as the degree to which a pair of associates are related or the inherent imagery value of the item. In contrast, extrinsic cues refer to either the conditions under which an item is studied, such as how many times an item is repeated or how much time is afforded for encoding, or the operations under which an item is encoded, such as whether the item was deeply or shallowly processed. Finally, mnemonic cues include internal, subjective indicators that involve the phenomenological experience of processing an item. Such experiences include how easily information comes to mind, how accessible pertinent information is, how easily the item is processed, and so forth. Koriat's (1997) framework has provided a basis for examining the factors that affect how individuals make JOLs across a variety of studied materials, including paired associates (e.g. Dunlosky & Nelson, 1994;Koriat & Bjork, 2005, 2006Kornell & Bjork, 2009;Mazzoni & Nelson, 1995), words in larger versus smaller font sizes (e.g. McDonough & Gallo, 2012;Mueller et al., 2014;Rhodes & Castel, 2008), words spoken in a louder versus softer voice (Frank & Kuhlmann, 2017;Rhodes & Castel, 2009), and abstract versus concrete words (e.g. Undorf et al., 2018). In terms of emotional stimuli, researchers have examined JOLs for a variety of stimulus types. For example, Zimmerman and Kelley (2010) and Tauber and Dunlosky (2012) found that younger adults were sensitive to the emotional content of both positive and negative words as exhibited by their higher JOLs for these items compared to neutral words, a phenomenon that Tauber et al. (2017) have termed an emotional salience effect on JOLs. This heightened sensitivity for emotional items was accompanied by enhanced memorability for these items. Judgements of learning have also been solicited for emotional facial expressions. For example, Nomi et al. (2013) found higher JOLs for faces studied with both angry and happy emotional expressions compared to neutral faces, although JOLs for angry and happy faces did not differ. Similarly, Witherby and Tauber (2018) examined JOLs for angry, afraid, sad, and neutral faces, finding that JOLs were higher for the emotional faces compared to neutral ones, but no JOL differences were found across the three emotional categories.
Recently, researchers have focused increasingly on determining why judgements of learning are influenced by the emotional content of stimuli. For example, Hourihan et al. (2017) explored whether higher JOLs for emotional words stem from the heightened arousal that these items induce (a physiological account), or whether these items are judged to be more memorable because of their cognitive distinctiveness (a cognitive account). Across three experiments, the authors found no evidence that higher JOLs reflect increased physiological arousal, but rather reflect the way that emotional lists are constructed, which makes their distinctive content more salient. In a yet more recent study, Undorf et al. (2018) demonstrated that individuals can utilise multiple cues when making JOLs, including number of study presentations, font size, word concreteness, and emotionality of the items. Consistent with previous studies, the authors found that participants gave higher JOLs to emotional than to neutral items. Furthermore, the authors suggested that their findings were consistent with those of Hourihan et al. (2017) in that the effects of emotionality on JOLs are cognitive rather than physiological in nature. However, Undorf et al. (2018) did not investigate the types of cognitive mechanisms that underlie JOLs for emotional items, nor did they differentiate between emotional words of different valence (positive versus negative). Tauber et al. (2017) produced findings that are perhaps more relevant to the question of why JOLs are higher for emotional stimuli. Examining monitoring of emotional information in both younger and older adults, the authors crossed positive versus neutral valence with high versus low arousal and examined the relative effects of each factor on JOLs made for emotional pictures. Results showed that both age groups demonstrated an emotional salience effect on JOLs, although this effect was driven by stimulus valence rather than by arousal. Tauber et al. suggested that JOLs for emotional stimuli made by both younger and older adults are based on their beliefs concerning how emotion affects memory, rather than their experiences when studying the stimuli. The authors also urged that future studies should examine the contributions of processing experiences and beliefs about memory for emotional stimuli by directly measuring these factors in participants. Witherby et al. (2021) recently proposed a model that includes what they termed the Experience factor and the Theory, Explicit Analysis factor, which underlie JOLs for emotional stimuli. In this framework, the experience path from the stimulus to the JOL represents such factors as the fluency of processing an emotional item or the arousal triggered by the emotional content of an item. In contrast, the theory/analytic path represents one's explicit beliefs concerning the memorability of emotional stimuli. Importantly, the authors proposed that the relative contributions of these two factors can be empirically measured.
Therefore, our goal in the current study was to further investigate the basis for the higher JOLs that people give for emotional words and whether the factors that underlie such JOLs may differ as a function of emotional valence. We first sought to replicate the emotional salience effect on JOLs (e.g. Hourihan et al., 2017;Tauber & Dunlosky, 2012;Zimmerman & Kelley, 2010). We then sought evidence that such JOLs may be driven in part by peoples' belief that emotional words are inherently more memorable than neutral words (i.e. the theory analytic path in the Witherby et al., 2021 framework). In addition, if people base their JOLs for emotional words on an intrinsic cue such as a memory belief, we asked what the nature of those beliefs are, using a methodology developed by Mueller et al. (2014). We also examined whether people utilise a mnemonic cue such as how easily an emotional item is processed (i.e. the experience path in Witherby et al.) In examining the experience path (i.e. processing fluency), our interest in the current study was on the potential effects of conceptual fluency on JOLs for emotional stimuli, rather than perceptual fluency, which has been examined in studies of larger versus smaller font sizes (e.g. McDonough & Gallo, 2012;Mueller et al., 2014;Rhodes & Castel, 2008), as well as words spoken in a louder versus softer voice (Frank & Kuhlmann, 2017;Rhodes & Castel, 2009). There are reasons to believe that the conceptual or semantic nature of emotional stimuli may affect the ease with which those stimuli are processed, thereby potentially influencing how individuals make predictions of how well those stimuli will be later remembered. For example, Kissler and Herbert (2013) recorded EEG event-related potentials (ERPs) while participants silently read a random sequence of positive, negative, and neutral words, along with pseudowords and letter strings. Results showed that for emotional words, differentiation of real words versus pseudowords (lexicality effects) occurred earlier than it did for real neutral words versus pseudowords as measured by early posterior negativities effects. These results show that at the level of cortical processing, identification of emotional words (i.e. lexical access) is faster than for neutral words (see also Schacht & Sommer, 2009). Based on these findings, we speculated that if lexical access of emotional words is faster at the cortical level (i.e. an unconscious discrimination), emotional words might also be more quickly identified in a conscious lexical decision task. Such a finding would provide evidence that JOLs for emotional words might be based in part on how fluently those words are processed.
Finally, we conducted a series of moderation analyses in which we examined the degree to which fluency and beliefs underlie JOLs by measuring both factors in the same participants. This approach allowed us to directly measure the contributions of these two factors, as suggested by Tauber et al. (2017) and Witherby et al. (2021).

Experiment 1
Our goal in Experiment 1 was to replicate previous findings that JOLs for emotional words are higher than for neutral words (e.g. Hourihan et al., 2017;Tauber & Dunlosky, 2012;Zimmerman & Kelley, 2010). In addition to predicting higher JOLs for emotional words than for neutral words, we predicted that these higher JOLs would be accompanied by higher recall for the emotional words as well (e.g. Kensinger, 2009;Levine & Edelstein, 2009;Tyng et al., 2017). Based on previous research (e.g. Hourihan et al., 2017;Tauber & Dunlosky, 2012), we expected to obtain effect size estimates (η p 2 ) between 0.03 and 0.12 for the effect of valence on JOLs in experiments 1 and 3.

Participants & Materials
Participants in all experiments were undergraduate students enrolled in psychology courses who participated in exchange for extra credit or partial course credit. Informed consent was collected from all participants prior to experimental procedures, and all experiments were approved by the institutional review board at the principal investigator's institution.
In Experiment 1, data from 33 individuals were collected. Sample size was based on previous research in which a similar paradigm was used (e.g. Zimmerman & Kelley, 2010 for Experiments 1 and 3; Mueller et al., 2014 for Experiments 2A and 2B), and data collection ceased at the end of the academic semester during which a given experiment was conducted. We used G*Power (Faul et al., 2007) to conduct a sensitivity analysis, and results showed that we achieved 80% power (α = .05) to detect a minimum effect size (η p 2 ) of 0.12 in a one-way repeated measures ANOVA with 33 participants. The mean age of participants was 22.70 years (Standard Deviation [SD] = 6.72), and the majority identified as women (69.7%).
Thirty-six of the words used by Pierce and Kensinger (2011) served as stimuli and were separated into three lists consisting of 12 words of positive (M valence = 7.57), neutral (M valence = 5.51), and negative valence (M valence = 2.11). As with the word lists used by Zimmerman and Kelley (Experiment 3, 2010), word arousal did not differ between the positive (M arousal = 6.01) and the negative (M arousal = 5.61) word lists. However, neutral list word arousal (M arousal = 4.52) differed significantly from word arousal of the emotional lists, F(2,33) = 10.139, p < .001, η 2 = .381. The lists did not differ by Kuçera-Francis frequency, number of characters, familiarity, concreteness, or imageability (all F's < 1.25, all p's > .30). Word presentation order was randomised for each participant.

Procedure
All experimental sessions were completed with participants sitting at a small desk with a personal computer (PC) and display screen at eye-level, and demographic responses were collected at the end of each experimental session. As in the Mueller et al. (2014) studies, free recall responses were scored such that any responses in which the first three letters of the response word matched the studied word were considered correct.
For Experiment 1, participants were told that they would be presented with a series of words that should be studied for a later test. Participants were then told that, following the presentation of each word, they would be asked to rate how confident they were that they would later remember the word on the test, using a scale ranging from 0 (definitely WILL NOT remember) to 100 (definitely WILL remember). All words were presented in 18point Courier New font, centred vertically and horizontally on the screen for 5 s each. Following the presentation of all 36 words, participants were given 4 min to recall as many words as they could, with instructions to be as accurate as possible. We did not include a distractor task prior to the recall phase in these experiments because the randomisation of word presentation order for each participant should serve to prevent any order effects that may otherwise occur.

Results
Traditional frequentist statistics are supplemented by Bayesian analyses, performed using PsyStat (Faulkenberry & Brennan, 2022), in order to provide more information regarding the relative evidence favouring the alternative and null hypotheses. The denotation BF 01 indicates a Bayes factor associated with a result in favour of the null hypothesis over the alternative hypothesis, and BF 10 indicates a Bayes factor for a result supporting the alternative hypothesis over the null. In both cases, the Bayes factor indicates the relative likelihood with which the observed data would be obtained under the favoured hypothesis. For example, BF 01 = 3.00 would suggest that the observed data is three times more likely under the null hypothesis relative to the alternative hypothesis. The default priors in PsyStat were used in all cases where Bayes factors are reported. In addition, in all experiments we applied the Bonferroni method to correct all post hoc pairwise comparisons regarding Type I error rate inflation associated with multiple comparisons.

Experiment 2A
Having further replicated the finding that JOLs are higher for emotional than for neutral words, our goal in Experiment 2A was to provide further evidence that people have an a priori belief that emotional words are more memorable than neutral words (Undorf & Zimdahl, 2019;Witherby & Tauber, 2018). We did this using the Pre-Study JOL method developed by Castel (2008). Based on previous research (e.g. Mueller et al., 2014, Experiment 4; Undorf & Zimdahl, 2019, Experiment 1), we expected to obtain an effect size estimate (η p 2 ) between approximately 0.28 and 0.55 for the effect of valence on pre-study JOLs.

Participants & Materials
Thirty-six participants completed Experiment 2A, but data from three participants were removed due to invariance in JOLs. The remaining 33 individuals were predominantly women (66.7%), and the mean age of all participants was 20.94 years (SD = 3.06). The results of a sensitivity analysis conducted using G*Power (Faul et al., 2007) showed that we achieved 80% power (α = .05) to detect a minimum effect size (η p 2 ) of 0.12 in a one-way repeated measures ANOVA with 33 participants. The same lists and words from Experiment 1 were used in this experiment.

Procedure
Participants were told that they would be presented with a series of words that should be studied for a later test but were then told that, prior to the presentation of each word, they would be asked to rate how confident they were that they would later remember the word on the test given the valence category (i.e. positive, neutral, or negative) of the word they were about to see. As in Experiment 1, participants were asked to make these judgements using a scale ranging from 0 (definitely WILL NOT remember) to 100 (definitely WILL remember). Following the judgement, participants were given a two-second countdown followed by the presentation of the word. All words, as well as the valence category to which they belonged, were presented in a random order for each participant and were once again shown in 18point Courier New font, centred vertically and horizontally on the screen for 5 s each. Recall instructions matched those given in Experiment 1.

Discussion
The most important finding from Experiment 2A is that participants predicted that an emotional word would be remembered better than a neutral word, even before the word was revealed. This finding suggests that people have a memory belief that emotional words are inherently more memorable than words lacking emotional valence, regardless of what the word is. We further explored the nature of these memory beliefs in Experiment 2B.

Experiment 2B
Experiment 2B served as a qualitative analysis of participants' beliefs regarding word valence and JOLs and was based on the work reported by Mueller et al. (Experiment 3A & 3B, 2014). Based on the results of Mueller et al., we expected to obtain an effect size estimate (η p 2 ) of approximately 0.60 for the effect of valence on participants' predictions.

Method
Eighty-eight participants were recruited for this experiment. The mean age of participants was 21.89 years (SD = 4.46). The majority of participants were female (75.9%). Data from one participant was lost due to a programme error, resulting in a total of 87 responses available for analyses. As in the methodology described by Mueller et al. (2014), participants in the present experiment were presented with the following instructions: In a previous experiment that we conducted, students were presented with a list of 36 words one after the other. Critically, one third of the words (i.e. 12) were positive, one third (12) were negative, and one third (12) were neutral words. Each word was presented for 5 s. The students' task was to study these words so that they would remember as many words as possible on a memory test. This memory test took place immediately after studying all the words and students were asked to recall as many words as they could.
We would like you to think about this task and estimate how many of each word type a student would remember.
Participants were then presented with a question asking them to estimate the number of words out of 12 that the students in the previous experiment would have remembered (e.g. How many positive words do you think the students remembered, out of 12? [An example of a positive word is: angel]). The order of presentation of valence questions was counterbalanced across all participants. After providing estimates for all three valences, students were reminded of their estimates and asked to provide free responses explaining why they had made these predictions.

Results and discussion
Participants' estimates of how many words other students remembered mirrored the JOL findings of Experiment 2A. That is, participants estimated that other students would remember more positivelyand negatively-valenced words than neutral words, just as participants predicted that they themselves would remember emotional words better. Specifically, a one-way repeated measures ANOVA of estimates of remembered words revealed a significant effect of valence, F(2,172) = 48.944, p < .001, η 2 = .363, BF 10 = 6.05 × 10 14 . Post hoc pairwise comparisons revealed no difference in word estimates between positive and negative valence, but significant differences between estimates for positive (M = 7.45, SD = 2.10) and neutral (M = 5.5, SD = 2.30) valenced words, t (1,86) = 7.10, p < .001, Cohen's d = 0.85, BF 10 = 3.13 × 10 8 , as well as a difference between estimates of negative (M = 8.09, SD = 2.32) and neutral valence words, t(1,86) = 9.52, p < .001, Cohen's d = 1.13, BF 10 = 1,71 × 10 12 . Thus, it appears that people have a belief that emotionality is a factor that makes items more memorable.
Two independent raters classified participant responses regarding the reason for the predicted number of remembered words. Due to missing data, a total of 84 responses were categorised into one of six categories. The first four categories were related to aspects of the words or participants' experience with the words and included: the chosen valence word was (a) more distinctive, (b) more emotionally arousing, (c) related to situations encountered in daily life, or (d) due to behavioural associations/learning. The remaining two categories included (e) circular reasoning explanations or (f) no or illegible reason given. Initial rater agreement was 76%, with differences resolved via discussion. The percentage of responses, along with an example from each category, are depicted in Table 1. The two most common responses involved the reasoning that valenced words were more arousing (29%) or related to daily life (27%). Overall, it appears that people's judgements about the memorability of emotional stimuli are driven by beliefs that they are emotionally arousing and/or familiar, and that these characteristics are conducive to memory. Therefore, such items should be more easily remembered later.

Experiment 3
Experiments 2A and 2B provide further evidence that people have an a priori belief that emotional words are more memorable that neutral words, lending tentative support to the Theory/Explicit Analysis factor proposed by Witherby et al. (2021) to account for the emotional salience effect on JOLs. Our next step was to search for evidence that would support Witherby et al.'s Experience factor. That is, are emotional words more fluently processed than neutral words, and if so, does this enhanced fluency underlie higher JOLs for these items? To test this processing fluency hypothesis, we examined whether higher JOLs for emotional words would be accompanied by faster reaction times (RTs) on a lexical decision task (e.g. Mueller et al., 2014).

Participants & Materials
A total of 43 students participated in Experiment 3. Of those participating, data from three participants were removed due to missing responses, three participants' data were removed due to no variability in provided JOLs, and one response was removed due to the participant remembering none of the studied words. The final number of available responses for analyses was therefore 36. We conducted a third sensitivity analysis using G*Power (Faul et al., 2007), which showed that we achieved 80% power (α = .05) to detect a minimum effect size (η p 2 ) of 0.08 in a repeatedmeasures ANOVA with 36 participants. The majority of participants were female (77.8%). The mean age of participants was 21.39 years (SD = 3.78). The same 36 words from Experiments 1 and 2A were used in Experiment 3, along with 36 non-words acquired from the English Lexicon Project (ELP; Balota et al., 2007). The 36 non-words in the present experiment were chosen by matching the length value provided by the ELP to the same value of a corresponding list word (e.g. ache, afed). Non-words were excluded from analyses. Presentation order for all 72 words was randomly determined for each participant.

Procedure
The instructions for Experiment 3 closely matched those of Mueller et al. (Experiment 1, 2014). Specifically, participants were told that, upon the appearance of an item on the screen, they should indicate if the item was a word by pressing the "Z" key or a non-word by pressing the "M" key. Participants were instructed to then study the item for any remaining time, after which the item would disappear. All items were displayed in 18-point Courier New font, centred vertically and horizontally on the screen for 5 s, and each item presentation was preceded by a 2 s countdown and a 500-millisecond fixation. Following the presentation of the items, participants were asked to make a JOL using the same 0-100 scale used in the prior experiment described above. Following the presentation of all 72 items, participants were given 4 min to recall as many of the words, but not the non-words, as possible.

Discussion
In addition to replicating the enhancement effect of emotionality on both JOLs and recall observed in Experiment 1, of greater importance is the finding that reaction times on a lexical decision task were faster for positive words compared to negative and neutral words. This faster identification of positive words suggests that one basis of the higher JOLs observed for those words may be the ease or fluency with which they are processed. However, the reaction time data for negative words suggests that greater fluency does not underlie the higher JOLs that these words are given.

Experiment 4
The experiments reported thus far produced three major findings. First, JOLs were consistently higher for both positive and negative words compared to neutral words, thereby replicating previous studies (Hourihan et al., 2017;Tauber & Dunlosky, 2012;Undorf et al., 2017;Zimmerman & Kelley, 2010). Second, participants believed that other participants, as well as themselves, would remember emotional words better than neutral words on a later test, largely due to the belief that such words are emotionally arousing and familiar. And third, positive, but not negative words, were processed more rapidly than neutral words. These results suggest that higher JOLs for negative words are based solely on peoples' beliefs that these words are inherently more memorable than neutral words, whereas greater processing fluency, in addition to beliefs, may influence the higher JOLs for positive words compared to neutral words.
However, the finding that mean reaction times are higher for positive compared to neutral words cannot be used as sufficient evidence that greater processing fluency underlies higher JOLs for positive words. Nor can the mere existence of beliefs be used as evidence that those beliefs necessarily influence item-level JOLs. What is needed is evidence that lower RTs predict higher JOLs at the item level and that individual differences in beliefs also relate to individual differences in item-level JOLs. In Experiment 4, therefore, we conducted a series of moderation analyses that allowed us to quantify the degree to which beliefs contribute to JOLs, as in Frank and Kuhlmann (2017). Unlike Frank and Kuhlmann, Experiment 4 will also allow us to include item-level processing fluency information into the model. Thus, we can assess both the effects of beliefs and fluency within a single study.
Note that this approach differs from a typical mediation analysis (as in Yang et al., 2021). We hold that the effects of beliefs, if they are influencing JOLs, should demonstrate a linear relationship between the belief magnitude and the JOL magnitude-moderation of the valence effect on JOLs. This approach has the added benefit of allowing us to properly fit a multi-level model to the data. The approach used by Yang et al. results in an overestimate of the degrees of freedom for participant-level effects, resulting in an underestimation of the standard errors. Although the two procedures often yield similar results, we view the moderation method as more theoretically and statistically appropriate for the current study design.

Participants & Materials
A sample between 100 and 150 participants was projected for this experiment based on previous research suggesting that larger sample sizes are needed to accurately estimate moderation effects (e.g. Hu et al., 2020). Data from 141 individuals were collected, and data collection ceased at the end of the academic semester during which this experiment was conducted. Of these 141 participants, 104 provided usable data. The mean age of these participants was 20.99 years (SD = 3.88), and the majority (80.4%) were women. Participants whose data could not be used included eight with no variance in JOLs, 18 with incomplete datasets (i.e. those who did not complete the experiment entirely), eight with no variance in lexical decision responses (always said "word" or "non-word") and three who were missing responses for an entire cell (e.g. did not give any JOLs for positive words).
For Experiment 4, a new set of 45 words were used, again taken from the Affective Norms for English Words (ANEW; Bradley & Lang, 1999), along with 45 new non-words acquired from the English Lexicon Project (ELP; Balota et al., 2007). The 45 non-words in the present experiment were chosen in the same manner that was employed in Experiment 3. Of the 45 words, 15 were positive (M valence = 7.77), 15 were neutral (M valence = 4.95) and 15 were negative in valence (M valence = 2.32). Word arousal did not differ between the positive (M arousal = 5.47) and the negative (M arousal = 5.34) word lists. As before, neutral word list arousal (M arousal = 4.50) was lower than both positive list arousal t(14) = 2.95, and negative word list arousal t(14) = 3.01 (both p's < .05). The three lists did not differ on log word frequency as measured by the SUBTLEX database (all p's > .30). In addition, the lists did not differ on number of letters, concreteness, or imagability (all p's > .12). As before, words and non-words were presented in a freshly randomised order for each participant, and non-words were excluded from analyses.

Procedure
The procedure was the same as that used in Experiment 3 with two exceptions. As in Experiment 3, participants made lexical decisions about words and nonwords, followed by JOLs for each word. The procedure of the present experiment differed from that of Experiment 3 in that (1) participation occurred online via Gorilla Experiment Builder (www.gorilla.sc; Anwyl-Irvine et al., 2019) instead of in a laboratory, and (2) participants in the present experiment were asked to make predictions of how many items of each valence they would remember on a later test prior to the presentation of items. These predictions are often termed global differentiated predictions (GPREDs; Kornell et al., 2011) and are similar to the pre-study JOLs that were elicited in Experiment 2A. Importantly, these predictions are a measure of preexisting beliefs about how different types of stimuli affect memory since they are made before those stimuli are actually presented (Frank & Kuhlmann, 2017). After the lexical decision task and JOL predictions, participants were asked to recall all of the words but not the non-words. As in Experiments 1 and 3, participants were given 4 min for the recall task. Any free recall responses were scored such that any responses in which the first four letters (the minimum number to uniquely identify correct responses in the new word list) of the response word matched the studied word were considered correct.
Additionally, we tested whether lexical decision times might correlate with JOLs by computing the correlation between JOLs and lexical decision times for each participant. Because the valence-JOL relationship is not linear (positive words received the highest JOLs followed by negative and neutral words) and there was no difference in decision times for negative and neutral words, we computed these correlations separately for positive/neutral and negative/neutral words. When examining positive and neutral words, the average correlation was, M Pearson r = -.06, t(102) = 3.28, p = .001, indicating that for positive and neutral words, faster decision times were associated with higher JOLs-though that relationship was rather weak. When examining negative and neutral words, the average correlation was, M Pearson r = -.05, t(102) = 2.53, p = .013, indicating that for negative and neutral words, faster decision times were also associated with higher JOLs-but again, the relationship was rather weak.

Moderation analyses
To test whether beliefs moderate the effect of valence on JOLs, we conducted two mixed models-one comparing positive and neutral words and the other comparing negative and neutral words. Valence was effect coded so that the regression weights (see Table 2) indicate differences between positive and neutral words (positive-neutral). For beliefs, we computed difference scores by subtracting each participant's neutral global prediction from their positive global prediction, yielding a measure of how much better they believed positive words would be remembered relative to neutral words (Positive Global Prediction -Neutral Global Prediction). We then repeated this procedure for negative and neutral global predictions (Negative Global Prediction -Neutral Global Prediction). Lexical decision times were trimmed at 3SDs then mean centred. Trimming occurred at the trial level and involved using the aggregate means and standard deviations to remove outlier responses. This was done because using log transformed decision times (as in the ANOVA above) would model a log-linear relationship between decision times and JOLs that we do not expect. The result of these procedures allows the model intercept to estimate the average JOL (when collapsing across valence), for words with average lexical decision times, when participants do not hold a belief about valence (global prediction difference = 0).
We then entered the corresponding global prediction difference score (subject-level variable) into a model with lexical decision times and valence (itemlevel variables) predicting JOLs (also at the item level). Critically, we also include the global prediction difference X Valence interaction term. If beliefs contribute to JOLs then those expecting a greater difference in performance for positive/negative words relative to neutral words would give correspondingly more disparate JOLs to each word type. That is, there should be a significant positive slope for the Global Prediction X Valence interaction. If fluency contributes independently to JOLs, then a negative main effect of fluency should be observed even after accounting for beliefs. Lastly, if fluency and beliefs fully mediate the effects of stimulus type on JOLs, then we can expect that the effect of Valence will become non-significant.
Positive vs. Neutral. As predicted, a positive global prediction difference X Valence interaction indicates that participants who believed that positive words would be better remembered than neutral words correspondingly showed a greater difference in their JOLs for each word type in the same direction, F(1, 2727) = 11.98, p < .001 (see Table 2 for regression weights). The main effect of valence remained significant, F(1, 2727) = 38.79, p < .001, indicating that although beliefs predicted the magnitude of the JOL effect, the average effect of valence, even when global prediction difference scores were 0 (i.e. the participant did not have a preexisting belief that positive words would be better remembered relative to neutral words), tended to be higher for positive words. Importantly, there was no effect of lexical decision times, F(1, 2727) = 0.61, p = .436, suggesting that fluency does not contribute significantly to JOLs after accounting for beliefs. Thus, the variance in JOLs that is tied to valence, but which cannot be explained by beliefs, is not, as we had predicted, explained by fluency. Or, more conservatively, if there is any effect of fluency, it was too small for us to detect with a within-subject comparison of 104 participants and 30 data points per participant. 1 Negative vs. Neutral. As predicted, a positive global prediction difference X Valence interaction indicates that participants who believed that negative words would be better remembered than neutral words correspondingly showed a greater difference in their JOLs for each word type in the same direction, F(1, 2660) = 17.72, p < .001 (see Table 3 for regression weights). As with positive and neutral words, the main effect of valence remained significant, F(1, 102) = 39.78, p < .001, indicating that although beliefs predicted the magnitude of the JOL effect, the average effect of valence, even when global prediction difference scores were 0 (i.e. the participant did not have a preexisting belief that negative words would be better remembered relative to neutral words), tended to be higher for negative words. As with positive and neutral words, there was no effect of lexical decision times, F(1, 2660) = 1.37, p = .242, indicating that fluency does not contribute significantly to JOLs after accounting for beliefs. 2

General discussion
Our goal in this study was to further investigate the contributions of processing fluency and beliefs to judgements of learning for emotional stimuli. In a series of experiments, we first replicated previous findings that JOLs are higher for emotional than for neutral words (e.g. Hourihan et al., 2017;Tauber et al., 2017;Zimmerman & Kelley, 2010). Utilising a pre-study JOL procedure and a memory beliefs questionnaire, both modelled after Mueller et al. (2014), we then found that people have an a priori belief that emotional items are more memorable than neutral items, a finding that is consistent with prior studies (Undorf & Bröder, 2020;Witherby & Tauber, 2018). We then found that processing fluency, as measured by reaction times on a lexical decision task, was indeed greater for positive than for neutral words, although there were no differences between negative and neutral words. To evaluate the relative contributions of both beliefs and fluency to JOLs, we conducted several moderation analyses in which both factors were measured in the same participants. The moderation analyses on positive versus neutral words and negative vs neutral words both showed that after accounting for the effect of beliefs, differences in lexical decision times did not contribute significantly to JOLs, thus providing evidence that JOLs for emotional words are based primarily on memory beliefs, with no additional influence of processing fluency.
Our study follows a number of similar investigations that have examined the joint roles of processing fluency and memory beliefs in how people make JOLs for a variety of stimuli, including large versus small font sizes (e.g. Hu et al., 2015;Mueller et al., 2014;Su et al., 2018;Undorf et al., 2017), words spoken in loud versus soft volume (Frank & Kuhlmann, 2017), concrete and abstract words (Witherby & Tauber, 2017), and words that vary in lexical frequency (Mendes et al. 2021). However, relatively few studies have examined beliefs and JOLs in the same participants (Frank & Kuhlmann, 2017;Hu et al., 2015), and none have included both beliefs and item-level fluency in a single model predicting JOLs. Although we measured processing fluency and beliefs using separate groups of participants in Experiments 2A, 2B, and 3, we followed the procedure of Frank and Kuhlmann and Hu et al. in Experiment 4 to assess both factors in the same participants. In Experiment 4, we again found that positive words were more fluently processed than neutral words, but this enhanced fluency did not contribute significantly to the emotional salience effect on JOLs. This finding highlights the importance of using methods that allow for both memory beliefs and experiencebased factors such as processing fluency to be measured at the participant level (see Yang et al., 2021 for a review of these methods).
The results of the current study are relevant to the framework proposed by Witherby et al. (2021) to address why people's monitoring of studied material (i.e. JOLs) is influenced by the material's emotional content. As discussed in the Introduction, Witherby et al. suggested several different theoretical mechanisms that may account for the effect of stimulus valence on JOLs, one of which is an experiencebased factor such as arousal or fluency, and the other representing an analytic-based factor such as

Limitations and future directions
One limitation of the current study concerns our use of lexical decision reaction times as a measure of processing fluency. Although several studies have used this measure to assess fluency (Mueller et al., 2014;Witherby & Tauber, 2017), Yang et al. (2021) pointed out the limitations of measuring fluency using a single measure. As Witherby et al. (2021) suggested, one avenue for future research would be to elicit multiple measures of processing fluency, such as selfpaced study time or number of trials to acquisition, in addition to lexical decision response times, to determine whether the emotional salience effect on JOLs is sensitive to these alternative fluency measures.

Funding
The author(s) reported there is no funding associated with the work featured in this article.

Notes
1. The main effect of global prediction difference was not significant, F < 1. Note that this means that those who predicted larger difference in performance do not give higher JOLs in general, an effect of no interest to the study. 2. The main effect of global prediction difference was not significant, F < 1. Again, this means that those who predicted larger difference in performance do not give higher JOLs in general, an effect of no interest to the study.

Data availability statement
The data that support the findings of this study are openly available in the Open Science Framework (OSF) at https://osf.io/ 93qev/?view_only=bf76d012404d4a8fbafe462c5a736d01.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
The author(s) reported there is no funding associated with the work featured in this article.