Effortful retrieval practice effects in lexical access: a role for semantic competition

ABSTRACT Word retrieval difficulty (lexical access deficit) is prevalent in aphasia. Studies have shown that practice retrieving names from long-term memory (retrieval practice) improves future name retrieval for production in people with aphasia (PWA), particularly when retrieval is effortful. To explicate such effects, this study examined a potential role for semantic competition in the learning mechanism(s) underlying effortful retrieval practice effects in lexical access in 6 PWA. Items were trained in a blocked-cyclic naming task, in which repeating sets of pictures drawn from semantically-related versus unrelated categories underwent retrieval practice with feedback. Naming accuracy was lower for the related items at training, but next-day accuracy did not differ between the conditions. However, greater semantic-relatedness of an item to its set in the related condition was associated with lower accuracy at training but higher accuracy at test. Relevance to theories of lexical access and implications for naming treatment in aphasia are discussed.


Introduction
Naming impairment is a common problem and impediment to functional communication in people with aphasia (PWA). Naming impairment manifests as frequent word substitutions (e.g. semantic error as in giraffe for zebra), distortions in the form of the word (e.g. bobot for robot), or outright response failures (i.e. omission) when attempting to name familiar, everyday objects, people, places, etc. Lexical access deficit, or difficulty retrieving words and/or their forms during naming, is a major contributor to naming impairment in aphasia (e.g. Schwartz et al., 2006). Evidence is amassing that practice retrieving names (e.g. for depicted objects) from long-term memory (hereafter, naming practice) is more beneficial to later naming accuracy in PWA compared to practice that does not involve retrieving names from long-term memory (e.g. Friedman et al., 2017;Middleton et al., 2019;Middleton et al., 2015;Middleton et al., 2016;Schuchard & Middleton, 2018a, 2018b. Furthermore, Middleton et al. (2016) found that naming practice that was more effortful conferred more durable learning (defined in Section 1.2) in PWA.
This new and growing evidence base regarding the effects of naming practice and retrieval effort on lexical access has far outpaced theoretical explication of such effects. The current study takes a step towards addressing this theory gap by evaluating a potential role for semantically-driven lexical competition (hereafter, semantic competition) in the learning mechanism (s) underlying effortful retrieval effects in lexical access.

Semantic context effects in naming
An extensive literature on semantic context effects indicates that picture naming (e.g. zebra) makes subsequent naming of items from the same semantic category (e.g. giraffe) more effortful, manifesting as increased naming error rates and/or latencies. Semantic context effects in naming have been extensively studied in neurotypical speakers, and less so in PWA, using the semantic variant of the blocked-cyclic naming task (e.g. Belke, 2008Belke, , 2013Belke et al., 2005;Belke & Stielow, 2013;Biegler et al., 2008;Damian et al., 2001;Damian & Als, 2005;Harvey & Schnur, 2015;McCarthy & Kartsounis, 2000;Patra et al., 2021;Schnur et al., 2006;Wilshire & McCarthy, 2002). In this task, participants name repeating sets of pictures drawn from either the same category (homogeneous condition) or multiple categories (mixed condition). Typically, a set is presented in a "block" comprised of multiple (usually 6) successive "cycles", and the items in the set are presented in random order in each cycle. A semantic blocking effect manifests as slower naming latencies and/or more naming errors in the homogeneous compared to the mixed context (e.g. Belke et al., 2005;Damian et al., 2001;Harvey & Schnur, 2015;Schnur et al., 2006). In addition, several studies have observed cumulative semantic interference growing decrement in naming accuracy or increment in naming latencyacross cycles (Harvey & Schnur, 2015;Schnur et al., 2006; but see Belke, 2008;Belke & Stielow, 2013). Also, the semantic blocking effect does not diminish with additional time between trials (Biegler et al., 2008;Schnur et al., 2006) or intervening trials in the blocks (Damian & Als, 2005;Navarrete et al., 2012). Semantic context effects have also been extensively studied in a related paradigm termed the continuous naming task (Howard et al., 2006), in which multiple exemplars from each of several categories are presented serially and in intermixed fashion for naming in a large list. In continuous naming, the effect of semantic context manifests as an incremental, cumulative increase in naming difficulty (e.g. in latencies or errors) with the presentation of each additional exemplar in a category (termed ordinal position effect; e.g. Belke, 2013;Harvey et al., 2019;Howard et al., 2006;Navarrete et al., 2010). The overall consensus in the literature is that semantic context effectsthe semantic blocking effect in blocked-cyclic naming and the ordinal position effect in the continuous naming taskarise from a learning process that persistently decreases the accessibility of related items, at least within the timeframe of the task (Howard et al., 2006;Oppenheim et al., 2010).
Naming is a complex process that begins with visual recognition/categorization of the object, followed by mapping from the encoded meaning (i.e. semantics) to a word, retrieval and encoding of the word's phonology, and finally, articulation. The stages dedicated to mapping from semantics to a word, and from the word to phonology, are typically regarded as the two main stages of lexical access (e.g. Dell, 1990;Dell et al., 1997;Dell & O'Seaghdha, 1992;Dell & Reich, 1981;Fay & Cutler, 1977;Fromkin, 1971;Garrett, 1975Garrett, , 1976Levelt et al., 1999;Rapp & Goldrick, 2000;Schwartz, 2014;Stemberger, 1985;cf., Caramazza, 1997;Caramazza & Miozzo, 1997). Studies collectively have identified that semantic blocking effects in naming localise to the mapping from semantics to a word (e.g. Damian et al., 2001;Goldrick & Rapp, 2007;Kroll & Stewart, 1994;Vigliocco et al., 2002). For greatest experimental sensitivity in our study, we recruited people with aphasia whose naming impairment can be attributed, at least in part, to problems retrieving words from semantics, as well as in retrieving phonology, as opposed to solely arising from processes that are peripheral to lexical access (see Section 2.1). We revisit the issue relating to the characterisation of our participants and learning effects in the current study in the Discussion (Section 4).
Several studies have observed that the deleterious effect of semantic context on naming is enhanced with increasing semantic similarity with previously named items. In blocked-cyclic naming, Navarrete et al. (2012) reported an enhanced semantic blocking effect for sets composed of more semantically similar (e.g. cat and dog) versus less semantically similar (e.g. cat and zebra) category members. Likewise, Vigliocco et al. (2002) reported an enhanced semantic blocking effect when sets were composed of items drawn from more similar (e.g. clothing and body parts) versus less similar (e.g. clothing and vehicles) categories. Studies using the continuous naming task have likewise found that the degree of semantic similarity between items affects the magnitude of interference from prior naming (Harvey et al., 2019;Rose & Abdel Rahman, 2017;see Alario & del Prado Martín, 2010 for a discussion). For example, in a recent study involving PWA, Harvey et al. (2019) found higher similarity between the first exemplar (ordinal position 1) and second exemplar (ordinal position 2) in a category that were presented in a session resulted into heightened semantic error rates at ordinal position 2.
To summarise, prior semantic context can enhance naming difficulty in a persistent manner (at least within the timeframe of the task); such effects localise to the mapping from semantics to words; and, semantic context effects are enhanced with greater semantic similarity of prior trials to the current naming trial. A possibility that has yet to be examined, however, is whether training naming amidst enhanced semantic competition, via blocked-cyclic naming, may ultimately promote more persistent gains from naming training, and/or enhanced accuracy measured at a later session. The following sections consider an empirical basis (Section 1.2), followed by the theoretical motivation (Section 1.3), for this possibility.

Retrieval-based learning effects in lexical access
The field of aphasiology has demonstrated growing interest in how basic research on fundamental learning mechanisms can help elucidate the treatment process and improve efficacy. Inspired by the neuroscientific principle of Hebbian learning (Hebb, 1949), pioneering studies by Fillingham and colleagues (Fillingham et al., 2005a(Fillingham et al., , 2005b(Fillingham et al., , 2006 examined an "errorless learning" naming treatment for aphasia whereby on each training trial, the object for naming was presented along with its name, and the name was repeated by the patient. This approach was designed to capitalise on the Hebbian notion that cell arrays that fire together, wire together by assuring the correct response (object's name) given the stimulus (depicted object) on every trial. Fillingham et al. (2005aFillingham et al. ( , 2005bFillingham et al. ( , 2006 compared errorless learning to "errorful" naming treatment, in which the participant was encouraged to attempt to retrieve the name for the object with cueing support (e.g. presentation of word onset), often leading to naming error. Whether the correct name was provided as feedback after errorful trials was variable across the studies. Single-case analyses in each study revealed that most PWA benefitted from both types of training approaches despite substantially higher error rates during errorful treatment (Fillingham et al., 2005a(Fillingham et al., , 2005b(Fillingham et al., , 2006see also, Conroy et al., 2009). A later study (McKissock & Ward, 2007) revealed that errorful learning provided the same benefit as errorless learning across a group of PWA, but only when correct-answer feedback was provided following the naming attempt. Middleton et al. (2015) revisited errorless learning naming treatment for aphasia, but compared it to a retrieval-based naming treatment. In contrast to the typical errorful treatment from prior studies (e.g. Fillingham et al., 2005aFillingham et al., , 2005bFillingham et al., , 2006, the retrievalbased naming treatment was informed by best practices derived from the retrieval practice (a.k.a. testenhanced learning) literature (for recent reviews, see Kornell & Vaughn, 2016;Rowland, 2014), including a focus on correct retrieval during treatment and consistent provision of feedback. The results from Middleton et al.'s study showed that though the rate of production of the name was highest in the errorless learning condition during training, both naming practice conditions outperformed the errorless learning condition on a next-day test of naming (hereafter, delayed test of naming), with the advantage persisting for the cued naming practice condition after one week. This constituted the first empirical demonstration that retrieval practice-a learning factor examined primarily in the context of knowledge acquisition-can persistently enhance the retrievability of existing lexical representations for production (i.e. lexical access). In Middleton et al., indications that retrieval practice impacted lexical access included that neuropsychological characterisation of the PWA was consistent with lexical access deficit underlying their naming impairment, and the study materials were pictures of familiar, common objects (e.g. scissors; caterpillar; pizza) with high name agreement.
Two features of retrieval practice are important to consider in research seeking to design interventions or training regimens that maximise the benefits from retrieval practice. First, the potency of retrieval practice training is driven mainly by correct retrievals; failed retrievals, even followed by correct-answer feedback, confer detectable but weak learning (Dunlosky & Rawson, 2012;Kornell et al., 2011;Middleton et al., 2015;Pashler et al., 2005;Wissman & Rawson, 2018). Second, information retrieved under more effortful conditions receives greater strengthening, i.e. learning is more durable (e.g. Karpicke & Bauernschmidt, 2011;Karpicke & Roediger III, 2007;Pashler et al., 2003;Pyc & Rawson, 2009). The signature pattern of more durable learning from enhanced retrieval effort is typically demonstrated with an interaction between training condition and time (training versus test) with opposing patterns of performance-higher error rate at training but better performance at test for the effortful condition (Karpicke & Roediger III, 2007;Middleton et al., 2016;Pashler et al., 2003; for discussion, see Schmidt & Bjork, 1992). That is, although making training more difficult means fewer items, or fewer trials per item, benefit from the strengthening that successful retrieval practice confers, the information that is successfully retrieved under more effortful conditions receives greater strengthening. This greater strengthening can ultimately confer enhanced test performance in the more effortful condition.
To evaluate whether more effortful retrieval confers more durable learning in aphasia treatment, Middleton et al. (2016) compared naming practice versus errorless learning in a group of PWA with lexical access deficit but additionally examined how the spacing of trials impacted later performance. For present purposes, the most illustrative aspect of that study involved manipulating the number (i.e. lag) of other-item trials between repeated naming attempts for an item. Presenting items at different degrees of spacing (lag 5, 15, or 30) permitted examination of how increased retrieval effort with increased spacing in the naming condition affected training and test performance. First, Middleton et al. observed an interaction indicating that though naming practice success rate at training dropped precipitously as the spaced schedule lags increased (reflective of more difficult retrieval with increasing lag), performance on the delayed tests was similar across the spaced lags, consistent with greater strengthening from retrievals at higher lags. The strongest evidence for an effect of effortful retrieval on later performance was reported in an analysis that statistically controlled for differences in training performance across lags. In that analysis, increasing lag was associated with increasing delayed test performance. In other words, naming practice that is more effortful for people with aphasia can come at the cost of heightened errors during training, but successful retrieval trials confer greater strengthening under more effortful training conditions.

Learning from inhibition
To advance a mechanistic understanding of effortful retrieval effects in lexical access, we consider a wellresearched phenomenon in the memory and learning literature, specifically retrieval-induced forgetting (RIF). RIF studies have shown that retrieving a target from long-term memory (FRUIT-O____, answer: orange) decreases subsequent retrievability of related items, i.e. competitors (FRUIT-B_____, answer: banana; for reviews, see Murayama et al., 2014;Storm & Levy, 2012;Verde, 2012). Controversy surrounds whether RIF manifests because competitors are inhibited when the target is retrieved, or because the target is strengthened from retrieval, which interferes with subsequent retrieval of its competitors (for debate, see Anderson, 2003;Raaijmakers & Jakab, 2013;Storm & Levy, 2012;Verde, 2012). However, features of RIF point to a role for inhibition. For example, counter to the interference account, not just any strengthening event creates RIF; rather, RIF is specific to retrieval practice (e.g. studying FRUIT-ORANGE does not decrease retrievability of FRUIT-B_____, answer: banana; Anderson et al., 2000;Bäuml, 2002). For our purposes, most important are observations that greater inhibition is conferred as a competitor is more (versus less) related to the category (Anderson et al., 1994;Storm et al., 2005Storm et al., , 2007, and inhibition from RIF can potentiate learning (Storm et al., 2008). Specifically, Storm et al. found items that are first inhibited via RIF and then strengthened (i.e. via restudy) are more retrievable later relative to items that do not undergo inhibition before strengthening. Furthermore, the superior retrievability of previously inhibited (versus non-inhibited) items was found to accumulate with each inhibition-study cycle, a phenomenon Storm et al. dubbed accelerated relearning. Now turning to the lexical access literature, as we reviewed in Section 1.1, semantic context effects in naming implicate learning. A prominent, computationally explicit framework for understanding such learning is the dark-side model of incremental learning in lexical access (Oppenheim et al., 2010). In the dark-side model, following each naming trial, a learning algorithm strengthens the retrieval connections between semantics and the target word (the light side of retrieval), and weakens connections to competitors, i.e. words concurrently active via overlapping semantics with the target (the dark side of retrieval). Importantly, learning is error-based in that the degree of weight change is driven by how over-(i.e. competitor) or under-(i.e. target) activated each word node was relative to a desired ("correct") activation value. Coupling the notion of error-based learning with accelerated relearning, we consider the possibility that (a) the greater cyclic inhibition and strengthening of items in homogeneous (versus mixed) sets in blocked-cyclic naming should ultimately confer more durable learning in naming, and (b) the degree of relatedness of items in a homogeneous set should also relate to the durability of learning.

Present study
As this is a training study, we first identified training items for each PWA that elicited naming error from a large picture corpus of common, everyday objects. Different sets of items were trained in a homogeneous versus a mixed context in blocked-cyclic naming in each of seven rounds. Each round comprised a training session and a next-day delayed test of naming of the items trained in that round. Correct-answer feedback (target name was auditorily presented) was provided after each naming attempt during training.
In the present design, if greater semantic competition enhances retrieval effort, the homogeneous condition should be associated with enhanced naming error rates during training compared to the mixed condition (i.e. semantic blocking effect). Furthermore, if the enhanced effort from training in a semantic context confers more durable learning, we expect to see the signature interaction of training condition (homogeneous versus mixed) and time (training versus delayed test of naming) on accuracy with enhanced delayed test accuracy for the homogeneous condition compared to the mixed. This pattern, which was observed in a prior study of effortful retrieval effects in lexical access (Middleton et al., 2016), would constitute strong evidence for a role for semantic competition in conferring more durable learning. However, depending on how our effortful retrieval manipulation is situated with regards to the trade-off between greater strengthening versus greater rates of failed retrieval during training (for discussion see Bjork, 1994;Pashler et al., 2003), we may observe similar levels of accuracy at the next-day test in the two conditions.
Next, because increasing semantic similarity between items in a set increases naming difficulty (Navarrete et al., 2012;Vigliocco et al., 2002), at training, we would expect poorer accuracy for items in homogeneous sets as the semantic similarity of an item to its set-mates increases. However, according to the effortful retrieval hypothesis, this greater difficulty should confer more durable learning, resulting in an interaction between an item's similarity to its set members and time (training versus delayed test), with enhanced test accuracy with increasing similarity of an item to its set-mates. Lastly, we report the standard indices of semantic blocking in accuracy and latencies, specifically the effect of context and possible accumulation of semantic interference across cycles at training, to contribute to the relatively small literature on semantic context effects in PWA (Biegler et al., 2008;Harvey et al., 2019;Harvey & Schnur, 2015;McCarthy & Kartsounis, 2000;Schnur et al., 2006;Scott & Wilshire, 2010).

Method
In comparison to neurotypical adults, studies involving individuals with neurological damage (e.g. people with stroke aphasia) require a strategy of achieving experimental sensitivity in the face of greater between-participant and within-participant variability. For example, PWA of even the same aphasia subtype (e.g. Broca's aphasia) can show great variability in their residual cognitive and linguistic skills, which can interact in unpredictable ways with experimental manipulations. In addition, within an individual, increased variability in performance from one trial to another within a task is a hallmark feature of neurological damage (MacDonald et al., 2006). In the present study, we addressed these challenges by (1) including participants with a relatively homogeneous profile in terms of their cognitive-linguistic deficits, and (2) designing the study to confer a large number of observations per condition per participant. We have adopted similar strategies in our prior work to provide stable results within and across participants Middleton et al., 2020;Middleton et al., 2015;Middleton et al., 2016). For example, Middleton et al. (2019;Middleton et al., 2016) showed statistically robust learning effects in a participant sample of four PWA with approximately 50 observations per condition per participant. With these studies as benchmarks, we set our recruitment goal for the current experiment at six PWA, with a more ambitious target of 84 observations per condition per participant.

Participants
Six participants were recruited from the Moss Rehabilitation Research Institute Participant Registry. All participants gave informed consent under a protocol approved by the Institutional Review Board of Einstein Healthcare Network, and were reimbursed $15 per hour of participation.
The inclusion criteria for the study were age range between 21-80 years, have English as their native or primary language, and give evidence of having the linguistic and cognitive capacity to understand the consent form and give informed consent. Participants were included without respect to gender, race, or ethnic background. Table 1 provides demographic and neuropsychological characteristics of the participant sample, which comprised 3 males and 3 females. Mean age was 51.7 years (SD = 13.5), all participants were pre-morbidly right-handed with one exception (participant 3), and mean education level was 14.7 years (SD = 2.6). All participants were diagnosed with post-stroke aphasia in the chronic phase as determined by the Western Aphasia Battery (WAB) Aphasia Quotient (AQ) (Kertesz, 2007).
The study participants were selected from a large (>100) pool of previously characterised and potentially available people with chronic post-stroke aphasia. These participants were prioritised for recruitment because they were able to commit to the months-long protocol, and their neuropsychological profile was consistent with detectable naming impairment attributable, at least in part, to lexical access deficit. The six participants presented with mild to moderate naming impairment on the Philadelphia Naming Test (Roach et al., 1996). The sample showed no worse than mild impairment on tests of nonverbal semantic comprehension (Pyramids & Palm Trees test; Howard & Patterson, 1992) and word comprehension (spoken word-topicture verification task; Roach et al., 1996), suggesting deficits in semantics or lexical-semantics was not a major contributor to their naming impairment. The sample also demonstrated good or very good word repetition, suggestive of minor contribution of post-lexical encoding or articulation problems to their naming impairment (see Table 1). No participant exhibited worse than moderate apraxia of speech. Appendix A provides a breakdown, per participant, of naming error types on the large set of items administered in the item selection task (described in Section 2.2.1). Some incidence of phonological error in naming was present across the sample, but naming errors consistent with an impairment in word retrieval (i.e. semantic errors, descriptions, and no response errors; Chen et al., 2019;Schwartz et al., 2009) were most prominent.

Materials and procedure
To enhance experimental sensitivity, a large picture corpus was used to select training items for each participant that elicited naming error prior to the main study. The corpus comprises 660 unique common objects (hereafter, 660-item set) collected from published picture corpora (Brodeur et al., 2010; Szekely et al., 2004) and various internet sources. Items in the corpus are characterised by several variables that can affect naming including visual complexity, name agreement, log word frequency, number of phonemes, and number of syllables. Visual complexity and name agreement values were collected from published corpora when available; otherwise these values were obtained in normative studies with a minimum of 40 responses per item. Mean name agreement for the 660-item set is 93% (SD = 6%; range = 80-100%). Log frequency values for the picture names were collected from Subtle-xUS (Brysbaert & New, 2009). Picture names that did not appear in SubtlexUS were assigned a log frequency value of zero. Audio recordings of the picture names were created by a female native English speaker.
The picture corpus was divided into 19 categories of related items informed by category production norms (Van Overschelde et al., 2004) and experimenter intuition. The goal was to divide the 660-item corpus into a large number of categories, each comprised of a large number of items, to increase the chances of obtaining a sufficient number of errorful items for a sufficient number of categories to populate the design for each participant (see Section 2.2.1). The categories were organised around items forming natural kinds or taxonomies (e.g. fruits and vegetables, body parts), or thematically and/or functionally related groups (e.g. accessories, toys and games, office supplies). The range of exemplars across categories was 18-43 items. There were 78 items that did not belong to any category (i.e. uncategorized items), some of which were used as fillers (see Section 2.2.2). Table 2 lists the 19 related categories with sample category members.

Item selection task
In the item selection task, the 660-item set was administered in its entirety for naming twice, one administration per week on different weeks preceding the main experiment. On each naming trial, the participant was shown the picture and instructed to name the picture to the best of their ability. They were provided 20 s to do so, after which the software automatically advanced; or, if the participant indicated they were finished attempting to name the picture, the experimenter advanced the trial prior to the end of 20 s. This procedure developed in our prior work (e.g. Middleton et al., 2016) was instituted to eliminate experimenter feedback of any kind regarding the potential correctness of the naming response.
To identify items for training, we selected the 14 categories with the highest proportion of items that were errorful across both administrations of the item selection task for a participant. Within each of the  (Kertesz, 2007); PNT = Philadelphia Naming Test (Roach et al., 1996) performance, where ACC = accuracy in percentages; Nonverbal Comp = an associative picture-picture matching task of nonverbal comprehension, in percentages (Howard & Patterson, 1992); Word Comp = a spoken word-picture verification task of word comprehension, in percentages (Roach et al., 1996); Word Rep = a test of immediate word repetition, in percentages (Philadelphia Repetition Test; Dell et al., 1997). a Average performance for neurotypical control sample b Scores below cutoff indicate clinically significant impairment 14 selected categories, the 12 most consistently errorful items were selected for training. This resulted in a number of items selected for training that were accurately named once or twice during item selection. For a participant's selected categories, the 12 selected items per category were randomly assigned into the homogeneous and mixed conditions while controlling for item selection naming accuracy, log frequency, visual complexity, number of phonemes, name agreement, and number of syllables (see Table 3). Mixed sets were comprised of 6 exemplars from different categories. When necessary, the sets were manually altered so that no items within a single set shared a phonological onset. For all homogenous items selected for training for each participant, an item's semantic similarity to each of its set mates was estimated in a pairwise fashion using latent semantic analysis (LSA; Landauer et al., 1998), and an item's mean semantic similarity across its set mates was derived (hereafter, item-to-set semantic similarity). Table 4 provides an example of LSA-based item-to-set semantic similarity estimates for a hypothetical homogeneous set.

Training and delayed test of naming sessions
In the main experiment, participants underwent seven "rounds", with each round comprising a training session and a next-day delayed test of naming of items trained in the prior session. For each participant, each round occurred in a different week. The training session in each round was devoted to training two homogeneous sets, which were unrelated to each other, and two mixed sets. 1 Individual sets were trained in one round only for a participant. In a training session, all items across the two mixed sets were from different categories, and those categories were unrelated to the two homogeneous categories also trained in that session. Order of the conditions within a session were counterbalanced across the seven training sessions and across participants. Within a session, each set underwent five sequential cycles of naming in which items in a set were presented in pseudo-random order with the constraint that the same item was not presented contiguously across two cycles. At the onset of each training trial, the depicted object was displayed, and the participant was provided 8 s to attempt to name the object. This was immediately followed by feedback, where the target name was auditorily presented and the participant repeated the name, after which the next trial was initiated.
In each delayed test of naming, the 24 critical items from the preceding training session were tested but they were distributed among 25 filler items in a pseudo-random order with the constraint of a minimum of 6 trials for other items between any two category members. Filler items were selected from the remaining items in the 660-item corpus that were not selected for training for a participant. Different fillers were used in each of the rounds for a participant. The addition of these filler items was intended to mitigate the potential for testing itself to instantiate a semantic context effect such as that observed in continuous naming (Howard et al., 2006).
Delayed test of naming trial structure followed the procedure used during item selection (see Section 2.2.1). To permit off-line measurement of naming response latencies on correct trials, simultaneous with picture presentation on each trial during training and test, the experimental software played a beep to mark the start of the trial. All sessions were recorded and transcribed into IPA for analysis by a trained expert. Including the item selection phase, mean time of participation was M = 15.2 (SD = 2.3) weeks and M = 17.3 (SD = .74) total sessions per participant.

Analyses
All participants completed seven rounds except participants 4 and 6. Participant 4 missed the round 6 delayed test and participant 6 missed the round 2 delayed test, both due to inclement weather. As a consequence, the data for the corresponding training sessions for these two participants were dropped from the analyses. The procedure produced 4800 training trials (i.e. 7 rounds x 4 sets x 6 items x 5 cycles x 4 participants + 6 rounds x 4 sets x 6 items x 5 cycles x 2 participants) and 960 delayed test trials (i.e. 7 rounds x 4 sets x 6 items x 4 participants + 6 rounds x 4 sets x 6 items x 2 participants) after excluding trials for filler items. With the exception of participants 4 and 6, the design produced 84 observations per condition per participant. Naming accuracy and naming onset latency (correct trials only) were calculated based on the participant's first complete, non-fragmented naming attempt per trial. To determine naming accuracy, phonological overlap (Lecours & Lhermitte, 1969; see formula below) between the naming attempt and target name was first calculated. Phonological overlap provides a continuous measure of phonological similarity to a target that is standardised across different word lengths. Shared phonemes were identified independent of position, and credit was assigned only once if a response had two instances of a single target phoneme (e.g. /kakt/ for cat is not considered correct). Semantic errors and descriptions (including all non-noun responses) received an overlap score of zero so as to avoid rewarding coincidental phonological similarity to a target. A response was coded as correct if phonological overlap was equal to or greater than 0.75; responses with phonological overlap less than 0.75 were considered incorrect. For accuracy, including the item selection phase, training and test phase, the protocol produced a total of 14,680 hand-coded responses across all six participants.
Phonological overlap = number of shared phonemes in target and response × 2 total number of phonemes in target and response To measure onset latency on correct naming trials, trained research staff used Praat software (Boersma & Weenink, 2016) to view the formants and glottal pulses of the responses. Onset latency was calculated from the trial-onset beep to the first glottal pulse that extended through at least two formants for voiced segments, and to the first visible increase in energy due to sound for unvoiced segments. For latency, including the training and test phase, the protocol produced a total of 4,715 hand-coded responses across all six participants. In preparation for the latency analyses, we removed outliers using the mean absolute deviation (MAD) method (for the upper range: +6SD from median, for the lower range: −3SD from median) and log-transformed the latencies (Leys et al., 2013;Wiley & Rapp, 2019).
Naming accuracy was modelled with mixed logistic regression using the glmer function in R version 3.6.0 (R Core Team, 2019) with alpha = .05 for tests of significance. To evaluate whether greater effort at training (homogeneous versus mixed) leads to more durable learning, we assessed a potential interaction of a two-level factor of condition (homogeneous versus mixed) and a twolevel factor of time (training versus test) on naming accuracy (correct/incorrect response). Sum contrasts were applied to the condition (+1 for mixed and −1 for homogenous) and time (+1 for training and −1 for test) factors. A significant interaction was followed by simpleeffects models to inspect potential effects of the condition factor at each timepoint. To evaluate whether greater effort due to higher item-to-set semantic similarity (defined in Section 2.2.1) within the homogenous set confers more durable learning, we assessed a potential interaction of the time factor and item-to-set semantic similarity entered as a numerical fixed effect, with the significant interaction followed with simple-effects models to inspect potential effects of the item-to-set semantic similarity variable at each timepoint. Though not of a priori interest, the same sequence of analyses was applied to naming onset latencies for correct trials using mixed linear regression. For completeness, the model results applied to naming onset latencies are reported in Appendix E1-E2. Finally, an analysis of forgetting to assess retention of accuracy performance from training to test in the homogeneous and mixed conditions was conducted (details in Section 3.1).
Item-specific variables (i.e. covariates) that can affect naming but are not of theoretical interest (log frequency, syllable length, number of phonemes, visual complexity, and name agreement; see Section 2.2.1) were entered as fixed effects in all models but were dropped if not significant. Random intercepts for participants and items were included in all models to capture the correlation among observations that can arise from multiple participants giving responses to overlapping sets of items. By-participant random slopes for the experimental factors were also included if they improved model fit by a chi-square test of deviance in model log likelihood (alpha = .05). Naming accuracy model results are reported in Tables 5 and 6. Models examining classic indices of semantic context effects in blocked-cyclic naming (i.e. semantic blocking effect; cumulative semantic interference) are described in Section 3.3. Lastly, for readers interested in more classic indices of treatment effects (i.e. pre to post-treatment change), models reporting change in naming accuracy from item-selection to the delayed test across the group (mixed logistic) and per participant (simple logistic) in the homogeneous and mixed conditions are reported in Appendix B. All participants showed significant improvement in both conditions.

Interaction of time and condition
The results revealed a significant interaction of the time factor and the condition factor (estimate = 0.10, SE = 0.05, Z = 2.12, p = .03; Table 5). Figure 1 presents mean naming accuracy across the participants in the mixed and homogeneous conditions at the training and test timepoints. The simple-effects model applied to training performance revealed a significant decrement in naming accuracy in the homogeneous condition compared to the mixed condition (estimate = 0.11, SE = 0.04, Z = 2.82, p = .005; Table 5). This finding is in line with the existing semantic blocking literature-naming items in a homogeneous versus mixed context is associated with heightened naming error. However, the simpleeffects model applied to test showed no decrement in naming accuracy in the homogeneous condition compared to mixed (estimate = −0.09, SE = 0.09, Z = −1.03, p = .30; Table 5). In fact, numerically, naming accuracy at test was higher for the homogeneous condition compared to the mixed condition.
One way to examine differential strengthening of retrieved information via effort manipulations is to examine "forgetting" (e.g. Roediger III & Karpicke, 2006). In the present study, this involved examining the rate of change in naming accuracy going from training to test for each condition separately. The results revealed a marginal decrement in naming accuracy going from training to test for the mixed condition (estimate = 0.12, SE = 0.07, Z = 1.85, p = .06; see Appendix C for full model) but not for the homogenous condition (estimate = −0.09, SE = 0.07, Z = −1.26, p = .20; see Appendix C for full model). In fact, the homogenous condition showed a numerical improvement (i.e. a gain of 2.5%) in naming accuracy going from training (naming accuracy = 0.823) to test (naming accuracy = 0.844). We provide interpretation of the forgetting findings and the time by condition interaction in the Discussion.

Interaction of time and item-to-set semantic similarity
The results revealed a significant interaction between item-to-set semantic similarity and time for naming accuracy (estimate = −2.10, SE = 0.76, Z = −2.76, p = .005; Table 6, Figure 2). The simple-effects model applied to training performance revealed a significant decrement in naming accuracy as item-to-set semantic  similarity increased (estimate = −1.28, SE = 0.57, Z = −2.21, p = .02; Table 6), an effect similar to that observed in other semantic blocking studies (Navarrete et al., 2012;Vigliocco et al., 2002). However, a finding that heretofore has not been examined or reported, at the delayed test, as an item's semantic similarity to its set increased, naming accuracy increased (estimate = 3.11, SE = 1.53, Z = 2.03, p = .04; Table 6). This suggests that a homogenous item with greater semantic similarity with its setdespite having less opportunity to be retrieved successfully during trainingreceives greater strengthening from the enhanced effort that is required when retrieved amongst greater versus lesser semantic competition.
3.3. Classic indices of semantic blocking effects As described in Section 3.1, we observed the standard semantic blocking effect in the form of a significant decrement in naming accuracy for the homogeneous condition compared to the mixed condition during training. To examine whether the difference between conditions grew across cycles, we modelled a cycle-bycondition interaction, using linear contrasts for cycle (see Schad et al., 2020) and sum coding for the condition factor. The cycle-by-condition interaction was significant (estimate = 0.49, SE = 0.13, Z = 3.93, p <.001; see Appendix D for model output, and Appendix F for naming accuracy means as a function of condition and cycle). However, it is problematic to interpret this interaction as evidence for cumulative semantic interference because the homogeneous and mixed sets did not differ in accuracy at Cycle 1 (estimate = −0.02, SE = 0.07, Z = −0.35, p = .72; Appendix D). This likely reflects the within-block semantic priming that can offset naming difficulty at Cycle 1 for homogeneous sets compared to mixed sets (for discussion, see Belke & Stielow, 2013). When Cycle 1 was dropped from the analysis, the interaction of cycle-by-condition was no longer significant (estimate = −0.04, SE = 0.12, Z = −0.30, p = .76; Appendix D). In other words, the semantic blocking effect in accuracy did not grow across cycles 2-5. Following the same analysis trajectory for latencies, including examination of simple-effects only in the presence of a significant interaction, there was no condition by time interaction (p = .27; Appendix E1) and no cycle-by-condition interaction (p = .88; see Appendix E3 for model output, and Appendix G for naming latency means as a function of condition and cycle).

Discussion
The goal of the present study was to examine a potential role for semantic competition in the theoretical explication of effortful retrieval practice effects in lexical access. To do this, the current study probed the durability of learning from training of errorful naming items for people with aphasia amidst more versus less semantic  competition using the blocked-cyclic naming paradigm as a training intervention.
With regards to naming accuracy, we observed a significant interaction of time and condition, with lower accuracy during training in the homogeneous condition compared to mixed but no difference in accuracy for the two conditions at test. Also, an analysis of forgetting revealed a trend for better retention of performance from training to test in the homogeneous versus the mixed condition. However, the marginal nature of the forgetting effect and lack of a difference between the conditions at test constitutes a failure to provide strong evidence for the effortful retrieval hypothesis, and does not align with reports of greater test performance in the more effortful condition in other studies of effortful retrieval learning effects (e.g. Karpicke & Roediger III, 2007;Middleton et al., 2016;Pashler et al., 2003). We next consider two explanations of these results.
One possibility is that though presenting items for naming training in a homogeneous condition induces greater retrieval effort and naming error, this enhanced effort is unrelated to learning. That is, similar performance at test in the homogeneous and mixed conditions may have resulted from the fact that during the training, participants engaged in multiple trials of retrieval practice followed by correct-answer feedback, which strengthened items to a comparable degree in the two conditions. Likewise, all participants benefitted strongly from both the homogeneous and mixed training contexts (see Appendix B), suggesting retrieval practice with feedback confers potent benefits regardless of the semantic context at training.
A second possibility is that successful retrievals that are more effortful at training due to semantic blocking confer more durable learning, but that our effortful retrieval manipulation was suboptimal as regards the tradeoff between greater training error rate and greater benefit from enhanced training effort. As discussed in the memory literature, unsuccessful retrievals during retrieval practice confer weak learning compared to successful retrievals (Dunlosky & Rawson, 2012;Kornell et al., 2011;Middleton et al., 2015;Pashler et al., 2005;Wissman & Rawson, 2018). Thus, more effortful training conditions can surpass a point of "desirable difficulty" if retrieval failures during training are too frequent, which can partially or completely eliminate the advantage to later performance from increasing the effort during training (Bjork, 1994;Pashler et al., 2003). In the current study, the additional retrieval effort required by enhanced semantic competition may have surpassed the point of desirable difficulty, leading to similar test performance for the homogeneous and mixed conditions. Future studies may revisit these issues by parametrically varying the effort required for retrieval via a manipulation of different degrees of semantic relatedness and examining more and longer retention intervals to increase experimental power for measuring forgetting in the different conditions. Another strategy could involve controlling for the number of correct retrievals during training between the homogeneous and mixed conditions by dropping items from further training when they reach a preassigned criterion of performance (e.g. Schuchard et al., 2020).
In the present study, findings from the semantic similarity analysis provided the strongest evidence of greater strengthening of items trained amidst enhanced semantic competition. First, we observed a cross-over interaction between item-to-set similarity and time. Specifically, increasing similarity of a homogeneous item to its set mates (item-to-set similarity) was associated with decreasing naming accuracy during training, reflective of enhanced retrieval difficulty. On the other hand, we observed that increasing item-to-set semantic similarity was associated with increasing naming accuracy at test. This indicates that greater retrieval effort due to greater interference from more highly related set mates at training conferred greater strengthening of items.
The results from the semantic similarity analysis are compatible with theories in the learning and memory literature that postulate a role for retrieval effort in the potency of learning from retrieval practice (e.g. Karpicke & Bauernschmidt, 2011;Karpicke & Roediger III, 2007;Pashler et al., 2003;Pyc & Rawson, 2009). In the case of lexical access, this study provides original evidence that increasing the effort required for retrieval of a target word by manipulating preceding semantic context affects a target word's retrievability at a future session. To more fully characterise the underlying learning mechanism, headway may be made by relating the present results to current theories of effortful retrieval effects. For example, according to the inhibitory account of retrieval induced forgetting, inhibition of related items when retrieving a target decreases the future accessibility of those competitors in a persistent fashion (Anderson et al., 2000;Storm et al., 2007). However, when a competitor becomes a target, its lower accessibility from prior inhibition potentiates the benefit it receives from a strengthening event (Storm et al., 2008). Though no models of lexical access yet exist that account for the present results, those that include mechanisms for retrieval-based weakening and strengthening (e.g. Oppenheim et al., 2010) may provide a better foundation for understanding the present results than those that only propose strengthening of targets following retrieval (e.g. Howard et al., 2006). Explicit, computational investigations are required to examine whether the fundamental assumptions of such models are ultimately compatible with the present results.
In addition to the training effects examined in the present study, we probed classic indices of semantic context effects in blocked-cyclic naming including a semantic blocking effect at training as well as cumulative semantic interference across cycles during training. The semantic blocking effect was apparent in the observation of decreased naming accuracy in the homogeneous condition compared to the mixed condition at training. However, we did not find evidence for cumulative semantic interference, which is not entirely unexpected. In an extensive review, Belke and Stielow (2013) found evidence of cumulative semantic interference only for participants with moderate to severe aphasia where neurological damage involved left frontal cortical sites, specifically left-inferior frontal gyrus. In the present study, PWA were not selected based on lesion profile; rather they were selected because of their cognitive-linguistic profile consistent with lexical access deficit as a contributor to their naming impairment and willingness to commit to the months-long protocol. In addition, the present study differed in important ways from the standard blockedcyclic naming paradigm in that the items selected for training were largely errorful, and feedback was given on each trial. It is unclear which aspect of our design may have precluded observing cumulative semantic interference.
The present work bears on theories of lexical access by demonstrating an effortful retrieval effect in people with aphasia whose naming deficit is consistent with lexical access deficit. The effect of effortful retrieval in this current study is likely to localise, at least in part, to the first stage of lexical access in our participants. The majority of naming errors produced during item-selection testing were semantic substitutions, omissions, and descriptions (Appendix A), and such errors localise to neuroanatomical areas implicated in semanticallydriven word retrieval (Chen et al., 2019;Schwartz et al., 2009Schwartz et al., , 2011. Though semantic naming errors in particular have also been attributed to dysregulated or degraded semantic representations (Gainotti et al., 1981;Hillis et al., 1990;Jefferies & Lambon Ralph, 2006), the participants in our sample had notably mild nonverbal semantic and word comprehension deficits (Section 2.1). Second, effortful retrieval in the present study was induced by a semantic context manipulation. Through careful experimentation, studies have localised semantic context effects in blocked-cyclic naming to the first (semantics-to-word) stage of lexical access (Damian et al., 2001;Kroll & Stewart, 1994;Vigliocco et al., 2002). However, we consider the possibility that enhanced effort from semantic competition may have impacted phonological retrieval in our participants. A rationale could be that, because of cascading activation, greater semantic competition provokes enhanced activation of competitor phonemes, translating into greater learning when the correct phonemes are ultimately retrieved. Such a possibility could be evaluated in a future study examining semantic-competition induced effortful retrieval effects in individuals with aphasia with relatively pure stage-1 versus stage-2 lexical access deficits. This is one of the many potentially exciting future directions for research seeking to manipulate semantic competition to enhance the efficacy of treatments for aphasia. Note 1. Due to experimenter error, one participant received training on three homogeneous sets and one mixed set in Round 1, and three mixed sets and one homogeneous set in Round 2.