Speech categorization in context: Joint effects of nonspeech and speech precursors

The extent to which context inﬂuences speech categorization can inform theories of pre-lexical speech perception. Across three conditions, listeners categorized speech targets preceded by speech context syllables. These syllables were presented as the sole context or paired with nonspeech tone contexts previously shown to affect speech categorization. Listeners’ context-dependent categorization across these conditions provides evidence that speech and nonspeech context stimuli jointly inﬂuence speech processing. Speciﬁcally, when the spectral characteristics of speech and nonspeech context stimuli are mismatched such that they are expected to produce opposing effects on speech categorization the inﬂuence of nonspeech contexts may undermine, or even reverse, the expected effect of adjacent speech context. Likewise, when spectrally matched, the cross-class contexts may collaborate to increase effects of context. Similar effects are observed even when natural speech syllables, matched in source to the speech categorization targets, serve as the speech contexts. Results are well-predicted by spectral characteristics of the context stimuli. © 2006 Acoustical Society of America. (cid:3) DOI: 10.1121/1.2195119


I. INTRODUCTION
Context plays a critical role in speech categorization. Acoustically identical speech stimuli may be perceived as members of different phonetic categories as a function of the surrounding acoustic context. Mann ͑1980͒, for example, has shown that listeners' categorization of a series of speech stimuli ranging perceptually from /ga/ to /da/ is shifted toward more "ga" responses when these target syllables are preceded by /al/. The same stimuli are more often categorized as "da" when /ar/ precedes them. Such contextdependent phonetic categorization is a consistent finding in speech perception ͑e.g., Lindblom and Studdert-Kennedy, 1967;Mann and Repp, 1981; see Repp, 1982 for review͒. Consideration of how to account for context-dependent speech perception highlights larger theoretical issues of how best to characterize the basic representational currency and processing characteristics of speech perception. Relevant to this interest, an avian species ͑Japanese quail, Coturnix coturnix japonica͒ has been shown to exhibit contextdependent responses to speech ͑Lotto et al., 1997͒. Birds operantly trained to peck a lighted key in response to a /ga/ stimulus peck more robustly in later tests when test syllables are preceded by /al/. Correspondingly, birds trained to peck to /da/ peck most vigorously to test stimuli when the are preceded by /ar/. Thus, birds exhibit shifts in pecking behavior contingent on preceding context analogous to contextdependent human speech categorization. The birds had no previous experience with speech, so their behavior cannot be explained on the basis of learned covariation of acoustic attributes across contexts or on the basis of existing phonetic categories. It is also unlikely that quail have access to spe-cialized speech processes or knowledge of the human vocal tract. The parallels between quail and human behavior suggest a possible role for general auditory processing, not specific to speech or dependent upon extensive experience with the speech signal, in context-dependent speech perception.
In accord with the hypothesis that general, rather than speech-specific, processes play a role in context-dependent speech perception there is evidence that nonspeech acoustic contexts affect speech categorization by human listeners. Following the findings of Mann ͑1980͒, Lotto and Kluender ͑1998͒ synthesized two sine-wave tones, one with a higher frequency corresponding to the third formant ͑F3͒ offset frequency of /al/ and the other with a lower frequency corresponding to the /ar/ F3 offset frequency. When these nonspeech stimuli preceded a /ga/ to /da/ target stimulus series like that studied by Mann ͑1980͒, speech categorization was influenced by the precursor tones. Listeners more often categorized the syllables as "ga" when they were preceded by the higher-frequency sine-wave tone modeling /al/. The same stimuli were more often categorized as "da" when the tone modeling /ar/ preceded them. Thus, nonspeech stimuli mimicking very limited spectral characteristics of speech contexts also influence speech categorization.
Nonspeech-elicited context effects on speech categorization appear to be a general phenomenon. Holt ͑1999; Holt and Lotto, 2002͒ reports that sine-wave tones or single formants situated at the second formant ͑F2͒ frequency of /i/ versus /u/ shift categorization of syllables ranging perceptually from /ba/ to /da/ in the same manner as the vowels they model. Likewise, flanking nonspeech frequency-modulated glides that follow the F2 formant trajectories of /bVb/ and /dVd/ syllables influence categorization of the intermediate vowel  A number of other studies demonstrate interactions of nonspeech context and speech percep-tion ͑Fowler et al., 2000;Kluender et al., 2003;Watkins and Makin, 1994, 1996a and the effects appear to be reciprocal. Stephens and Holt ͑2003͒ report that preceding /al/ and /ar/ syllables modulate perception of following nonspeech stimuli. Follow-up studies have demonstrated that listeners are unable to relate the sine-wave tone precursors typical of these studies to the phonetic categories the tones model ͑Lotto, 2004͒; context-dependent speech categorization is elicited even with nonspeech precursors that are truly perceived as nonspeech events.
There is evidence that even temporally nonadjacent nonspeech precursors can influence speech categorization. Holt ͑2005͒ created "acoustic histories" composed of 21 sinewave tones sampling a distribution defined in the acoustic frequency dimension. The acoustic histories terminated in a neutral-frequency tone that was shown to have no effect on speech categorization. In this way, the context immediately adjacent to the speech target in time was constant across conditions. The mean frequency of the acoustic histories differentiated conditions, with distribution means approximating the tone frequencies of Lotto and Kluender ͑1998͒. Despite their temporal nonadjacency with speech targets, the nonspeech acoustic histories had a significant effect on categorization of members of a following /ga/ to /da/ speech series. In line with previous findings, the higher-frequency acoustic histories resulted in more "ga" responses whereas the lower-frequency acoustic histories led to more "da" responses. These effects were observed even when as much as 1.3 s of silence or 13 repetitions of the neutral tone separated the acoustic histories and the speech targets in time.
In each of the cases for which effects of nonspeech contexts on speech categorization have been observed, the nonspeech contexts model limited spectral characteristics of the speech contexts. As simple pure tones or glides, they do not possess structured information about articulatory gestures. Moreover, even the somewhat richer nature of the acoustic history tone contexts of Holt ͑2005͒ are far removed from the stimuli that may be perceived as speech in sine-wave speech studies ͑e.g., Remez et al. 1994͒. The commonality shared between the tones composing the acoustic histories and sine-wave speech is limited to the fact that both make use of sinusoids. The tonal sine-wave speech stimuli are composed of three or four concurrent time-varying sinusoids, each mimicking the center frequency and amplitude of a natural vocal resonance measured from a real utterance. Thus, the sine-wave replicas that may give rise to speech percepts possess an overall acoustic structure that much more closely mirrors the speech spectrum it models. By contrast, the single sine-waves of, for example, Lotto and Kluender ͑1998͒ or the sequences of sine waves of Holt ͑2005͒ are far more removed from the precise time-varying characteristics of speech. The tones composing the acoustic histories of Holt ͑2005͒ are single sinusoids of equal amplitude, separated in time ͑not continuous͒, and randomized on a trial-by-trial basis. The nonspeech contexts provide neither acoustic structure consistent with articulation nor acoustic information sufficient to support phonetic labeling ͑see Lotto, 2004͒. What they do share with the speech contexts they model is a very limited resemblance to the spectral information that differ-entiates, for example, the /al/ from /ar/ contexts that have been shown to influence speech categorization ͑Mann, 1980͒.
The directionality of the context-dependence is likewise predictable from this spectral information. Across the observations of context-dependent speech categorization for speech and nonspeech contexts, the pattern of contextdependent categorization is spectrally contrastive ͑Holt, 2005; Lotto et al., 1997;Lotto and Kluender, 1998͒; precursors with acoustic energy in higher frequency regions ͑whether speech or nonspeech, e.g., /al/ or nonspeech sounds modeling the spectrum of /al/͒ shift categorization toward the speech category characterized by lower-frequency acoustic energy ͑i.e., /ga/͒ whereas lower-frequency precursors ͑/ar/ or nonspeech sounds modeling /ar/͒ shift categorization toward the higher-frequency alternative ͑i.e., /da/͒. The auditory perceptual system appears to be operating in a manner that serves to emphasize spectral change in the acoustic signal. Contrastive mechanisms are a fundamental characteristic of perceptual processing across modalities. General mechanisms of auditory processing that produce spectral contrast may give rise to the results observed for speech and nonspeech contexts in human listeners with varying levels and types of language expertise ͑Mann, 1986; Fowler et al., 1990͒ andin quail subjects ͑Lotto et al., 1997͒. Neural adaptation andinhibition are simple examples of neural mechanisms that exaggerate contrast in the auditory system ͑Smith, 1979; Sutter et al., 1999͒, but others exist at higher levels of auditory processing ͑see e.g., Delgutte, 1996;Ulanovsky et al., 2003; that produce contrast without a loss in sensitivity ͑Holt and Lotto, 2002͒. The observation of nonspeech context effects on speech categorization when context and target are presented to opposite ears ͑Holt and Lotto, 2002;Lotto et al., 2003͒ and findings demonstrating effects of nonadjacent nonspeech context on speech categorization ͑Holt, 2005͒ indicate that the mechanisms are not solely sensory. 1 Moreover, there is evidence that mechanisms producing spectral contrast may operate over multiple time scales ͑Holt, 2005; Ulanovsky et al., 2003 By this general perceptual account, speech-and nonspeech-elicited context effects emerge from common processes that are part of general auditory processing. These mechanisms are broadly described as spectrally contrastive in that they emphasize spectral change in the acoustic signal, independent of its classification as speech or nonspeech or whether the signal carries information about speech articulation. So far, observed effects have been limited to the influence of speech or nonspeech contexts on speech categorization ͑or, conversely, the effects of speech contexts on nonspeech perception, Stephens and Holt, 2003͒. However, an account that relies upon spectral contrast makes strong directional predictions about context-dependent speech categorization in circumstances in which both speech and nonspeech contexts are present. Specifically, this account predicts that when both speech and nonspeech are present as context, their effects on speech categorization will be dictated by their spectral characteristics such that they may either cooperate or conflict in their direction of influence on speech categorization as a function of how they are paired. If the speech and nonspeech contexts are matched in the distri-bution of spectral energy that they possess such that they are expected to shift speech categorization in the same direction, then nonspeech may collaborate with speech to produce greater effects of context than observed for speech contexts alone. Conversely, when nonspeech and speech contexts possess spectra that push speech categorization in opposing directions, nonspeech contexts should be expected to lessen the influence of speech contexts on speech categorization. As a means of empirically examining the hypotheses arising from this account, the present experiments examine speech categorization when both speech and nonspeech signals serve as acoustic context, specifically investigating the degree to which they may jointly influence speech categorization.

II. EXPERIMENT 1
The aim of this study thus is to assess the relative influence of speech and jointly presented nonspeech contexts on speech categorization. Experiment 1 examines speech categorization of a /ga/ to /da/ syllable series across three contexts: ͑1͒ preceding /al/ and /ar/ syllables; ͑2͒ the same speech syllables paired with spectrally matched nonspeech acoustic histories ͑as described by Holt, 2005͒ that shift speech categorization in the same direction ͑e.g., High Mean acoustic histories paired with /al/͒; ͑3͒ the same speech syllables paired with spectrally mismatched nonspeech acoustic histories that shift speech categorization in opposing directions ͑e.g., Low Mean acoustic histories paired with /al/͒. Whereas the speech contexts remain consistent across conditions, the nonspeech contexts vary. Thus, if speech and nonspeech contexts fail to jointly influence speech categorization there will be no significant differences in speech categorization across conditions and, as in previous studies, speech targets preceded by /al/ will be more often categorized as "ga" than the same targets preceded by /ar/. If, however, the two sources of acoustic context mutually influence speech categorization as predicted by a general perceptual/cognitive account of context effects in speech perception then the ob-served context effects will vary across conditions and the relative influence of each context source on speech categorization can be assessed.

Participants
Ten adult monolingual English listeners recruited from the Carnegie Mellon University community participated in return for a small payment or course credit. All participants reported normal hearing.

Stimuli
Stimulus design is schematized in Fig. 1. For each stimulus an acoustic history composed of 21 sine-wave tones preceded a speech syllable context stimulus, a 50-ms silent interval, and a speech target drawn from a stimulus series varying perceptually from /ga/ to /da/.
a. Speech. Speech target stimuli were identical to those described previously ͑Holt, 2005; Wade and Holt, 2005͒. Natural tokens of /ga/ and /da/ spoken in isolation were digitally recorded from an adult male monolingual English speaker ͑CSL, Kay Elemetrics; 20-kHz sample rate, 16-bit resolution͒. From a number of natural productions, one /ga/ and one /da/ token were selected that were nearly identical in spectral and temporal properties except for the onset frequencies of F2 and F3. LPC analysis was performed on each of the tokens and a nine-step sequence of filters was created ͑Analysis-Synthesis Laboratory, Kay Elemetrics͒ such that the onset frequencies of F2 and F3 varied approximately linearly between /g/ and /d/ endpoints. These filters were excited by the LPC residual of the original /ga/ production to create an acoustic series spanning the natural /ga/ and /da/ end points in approximately equal steps. Each stimulus was 589 ms in duration. The series was judged by the experimenter to comprise a gradual shift between natural-sounding /ga/ and /da/ tokens and this impression was confirmed by regular shifts in phonetic categorization across the series by participants in the Holt ͑2005͒ and Wade and Holt ͑2005͒ studies. These speech series members served as categorization targets for each experimental condition. Spectrograms of odd-number series stimuli are shown in Fig. 2.
In addition, there were two speech context stimuli. These 250-ms syllables corresponded perceptually to /al/ and /ar/ and were composed of a 100-ms steady-state vowel followed by a 150-ms linear formant transition. Stimuli were synthesized using the cascade branch of the Klatt ͑1980͒ synthesizer. These stimuli were identical to those shown in earlier reports to produce spectrally contrastive context effects on perception of speech ͑Lotto and Kluender, 1998͒ and nonspeech ͑Stephens and Holt, 2003͒. Lotto and Kluender ͑1998͒ provide full details of stimulus synthesis.
b. Nonspeech. Acoustic histories were created as described by Holt ͑2005͒. Each acoustic history was composed of 21 70-ms sine-wave tones ͑30-ms silent intervals͒ with unique frequencies. Distributions' mean frequencies ͑1800 and 2800 Hz͒ were chosen based on the findings of Lotto and Kluender ͑1998͒, who demonstrated that single 1824 versus 2720 Hz tones produce a spectrally contrastive context effect on speech categorization targets varying perceptually from /ga/ to /da/. "Low Mean" acoustic histories were composed of 1300-2300 Hz tones ͑M = 1800 Hz, 50-Hz steps͒. "High Mean" acoustic histories possessed tones sampling 2300-3300 Hz ͑M = 2800 Hz, 50-Hz steps͒.
To minimize effects elicited by any particular tone ordering, acoustic histories were created by randomizing the order of the 21 tones on a trial-by-trial basis. Each trial was unique; acoustic histories within a condition were distinctive in surface acoustic characteristics, but were statistically consistent with other stimuli drawn from the distribution defining the nonspeech context. Thus, any influence of acoustic histories on speech categorization is indicative of listeners' sensitivity to the long-term spectral distribution of the acoustic history and not merely to the simple acoustic characteristics of any particular segment ͑for further discussion see  Tones comprising the acoustic histories were synthesized with 16-bit resolution and sampled at 10 kHz using MATLAB ͑Mathworks, Inc.͒. Linear onset/offset amplitude ramps of 5 ms were applied to all tones. Target speech stimuli were digitally down-sampled from their recording rate of 20-10 kHz and both tones and speech tokens were digitally matched to the rms energy of the /da/ end point of the target speech series.
As discussed in Sec. I, very broad interpretation of the kind of acoustic energy that may carry articulatory information may cause concern that the High and Low mean acoustic histories could serve as information about articulatory events and perhaps lead listeners to identify the nonspeech acoustic histories phonetically. To allay this concern, 10 monolingual English participants who reported normal hearing were tested in a pilot stimulus test. These participants did not serve as listeners in any of the reported experiments and had not participated in experiments of this sort before. These listeners identified the High and Low mean acoustic histories as "al" or "ar" in the context of the following speech syllable pairs described above. If the limited spectral information that the acoustic histories model from the /al/ and /ar/ contexts serves as information about articulatory events, we should expect High mean acoustic histories to elicit more "al" responses and Low mean acoustic histories to elicit more "ar" responses. This was not the case. Listeners' phonetic labeling of the High versus Low mean acoustic histories as "al" was not greater for the High mean acoustic histories ͑M High = 51.1, SE = 0.52͒ than Low mean acoustic histories ͑M Low = 51.0, SE = 1.19; t Ͻ 1 in a paired-samples t-test͒.
c. Stimulus construction. Two sets of stimuli were constructed from these elements. To create the hybrid nonspeech/speech contexts preceding the speech targets, each of the nine /ga/ to /da/ target stimuli was appended to the /al/ and /ar/ speech contexts with a 50-ms silent interval separating the syllables. Each of the resulting 18 disyllables was appended to two nonspeech contexts, one an acoustic history defined by the High Mean distribution and the other an acoustic history with a Low Mean. This pairing of disyllables with acoustic histories was repeated 10 times, with a different acoustic history for each repetition. This resulted in 360 unique stimuli, exhaustively pairing /al/ and /ar/ speech contexts with High and Low mean nonspeech contexts and the nine target speech series stimuli across 10 repetitions. A second set of stimuli with only speech contexts preceding the speech targets also was created; /al/ and /ar/ stimuli were appended to each of the speech target series members with a 50-ms interstimulus silent interval for a total of 18 stimuli. These stimuli were presented 10 times each during the experiment.

Design and procedure
The pairing of speech and nonspeech contexts in stimulus creation yielded the two experimental conditions illustrated in Fig. 1 shift speech categorization in the same direction. Note that these pairings can also be described in terms of the spectral characteristics of the component context stimuli because spectral characteristics well-predict the directionality of context effects on speech categorization ͑e.g., Lotto and Kluender, 1998͒. For example, High Mean acoustic histories were matched with /al/ ͑also possessing greater highfrequency acoustic energy͒ in the spectrally matched Cooperating condition and with /ar/ ͑with greater low-frequency energy͒ in the spectrally mismatched Conflicting condition.
Seated in individual sound-attenuated booths, listeners categorized the speech target of each stimulus by pressing electronic buttons labeled "ga" and "da." Listeners completed two blocks in a single session; the order of the blocks was counterbalanced. In one block, the hybrid nonspeech plus speech contexts preceded the speech targets. In this block, stimulus presentation was mixed across the Conflict-ing and Cooperating conditions. In the other ͑Speech Only͒ block, participants heard only /al/ or /ar/ preceding the speech targets. Thus, each listener responded to stimuli from all three conditions. Acoustic presentation was under the control of Tucker Davis Technologies System II hardware; stimuli were converted from digital to analog, low-pass filtered at 4.8 kHz, amplified and presented diotically over linear headphones ͑Beyer DT-150͒ at approximately 70 dB SPL͑A͒.

B. Results
Results were analyzed in terms of average percent "ga" responses across stimulus repetitions and are plotted in the top row of Fig. 3. The nonoverlapping categorization curves illustrated in each of the top panels of Fig. 3 are indicative of an influence of context for each condition ͑see also the mar- FIG. 3. Mean "ga" responses to speech series stimuli for Experiment 1 ͑top panel͒ and Experiment 2 ͑bottom panel͒. The "Speech Only" panels present categorization data for /al/ and /ar/ contexts. The other two panels illustrate categorization when the same stimuli are preceded by High and Low Mean acoustic histories and the /al/ or /ar/ precursors. In the "Cooperating" condition, speech and nonspeech precursors are expected to shift categorization in the same direction ͑High+ / al/, Low + /ar/͒. In the "Conflicting" condition, acoustic histories and speech precursors exert opposite effects on speech categorization ͑Low+ / al/, High+ / ar/͒. ginal means plotted in Fig. 4͒. Critically, although the immediately preceding speech context was constant across conditions, the observed context effects were not identical. Repeated-measures analysis of variance results are described in the following. Probit boundary analysis ͑Finney, 1971͒ of participants' category boundaries across conditions reveals the same pattern of results. The results of these analyses are provided in Table I.

Speech Only condition
The average percent "ga" responses across participants were submitted to a 2 ϫ 9 ͑Contextϫ Target Speech Stimu-lus͒ repeated measures ANOVA. This analysis revealed a significant effect of Context, F͑1,9͒ = 12.12, p = 0.007, p 2 = 0.574. Consistent with earlier findings ͑Lotto and Kluender, 1998;Mann, 1980͒, listeners categorized speech targets preceded by /al/ as "ga" significantly more often ͑M = 60.44, SE = 2.86, here and henceforth, means refer to "ga" responses averaged across target speech stimuli and participants͒ than the same targets preceded by /ar/ ͑M = 55.22, SE = 2.57͒. These data confirm that, on their own, the speech context precursors have a significant effect on categorization of neighboring speech targets. Probit boundary values are presented in Table I.

Cooperating condition
A 2ϫ 9 ͑Contextϫ Target Speech Stimulus͒ repeated measures ANOVA revealed that there was also a significant effect of Cooperating nonspeech/speech contexts on speech categorization, F͑1,9͒ = 40.22, p Ͻ 0.0001, p 2 = 0.817. As would be expected from the influence that speech and nonspeech contexts elicit independently ͑Lotto and Kluender, 1998;, the effect observed in the Cooperating condition was spectrally contrastive; categorization was shifted in the same direction as in the Speech Only condition. When listeners heard speech targets preceded by High Mean acoustic histories paired with /al/, they more often catego-rized the targets as "ga" ͑M = 62.22, SE = 2.05͒ than when the same targets were preceded by Low Mean acoustic histories paired with /ar/ ͑M = 49.11, SE = 2.34͒.
The primary aim of this study was to examine potential joint effects of speech and nonspeech acoustic contexts in influencing speech target categorization. A 2 ϫ 2 ϫ 9 ͑Conditionϫ Contextϫ Target Speech Stimulus͒ repeated measures ANOVA of the categorization patterns of the Speech Only condition versus those of the Cooperating condition indicates that when speech and nonspeech contexts are spectrally matched such that they are expected to influence speech categorization similarly, they collaborate to produce an even greater context effect on speech target categorization

Conflicting Condition
A 2ϫ 9 ͑Contextϫ Target Speech Stimulus͒ repeated measures ANOVA of responses to Conflicting condition stimuli revealed that when the spectra of speech and nonspeech contexts predicted opposing effects on speech categorization, there was also a significant effect of context, F͑1,9͒ = 25.97, p = 0.001, p 2 = 0.743. Note, however, the direction of this effect. Listeners more often categorized target syllables as "ga" when they were preceded the High Mean acoustic histories paired with /ar/ speech precursors ͑% "ga" responses: M High+/ar/ = 59.89, SE = 2.41 vs M Low+/al/ = 49.11, SE = 2.34͒. In this example, the /ar/ speech context independently predicts more "da" responses ͑Mann, 1980͒ whereas the High Mean nonspeech acoustic histories independently predict more "ga" responses ͑Holt, 2005͒. Listeners more often responded "ga," following the expected influence of the nonspeech context rather than that of the speech context that immediately preceded the speech targets. These results indicate that when the spectra of nonspeech and speech contexts are put in conflict, the influence of temporally nonad-TABLE I. Category boundaries were estimated for each participant's response to each condition of the experiment. The mean probit boundary across participants is presented in terms of the stimulus step across the nine-step /ga/ to /da/ categorization target series. The results parallel those of the ANOVA analyses across the speech stimulus series reported in the text.

Experiment
Condition Precursor jacent nonspeech context may be robust enough even to undermine the expected influence of temporally adjacent speech contexts. Of note, a 2 ϫ 2 ϫ 9 ͑Conditionϫ Contextϫ Target Speech Stimulus͒ repeated measures ANOVA comparing the Conflicting condition to the Speech Only condition revealed no main effect of Context, F͑1,9͒ = 2.98, p = 0.119, p 2 = 0.249, but a significant Condition by Context interaction, F͑8,72͒ = 83.17, p Ͻ 0.0001, p 2 = 0.902. This indicates that the context effect produced by the speech contexts plus conflicting nonspeech contexts was statistically equivalent in magnitude, although opposite in direction, to that produced by the speech contexts alone.

Comparison of Cooperating vs Conflicting conditions
The relative contributions of speech and nonspeech contexts can be assessed with a 2 ϫ 2 ϫ 9 ͑Conditionϫ Context ϫ Target Speech Stimulus͒ repeated measures ANOVA comparing the effects of nonspeech/speech hybrid contexts across Cooperating and Conflicting conditions. This analysis reveals an overall main effect of Context ͑context was coded in terms of the nonspeech segment of the precursor͒, F͑1,9͒ = 37.207, p Ͻ 0.0001, p 2 = 0.805, such that listeners more often labeled speech targets as "ga" when nonspeech precursors were drawn from the High Mean acoustic history distribution ͑M = 61.06, SE = 2.01͒ than the Low Mean distribution ͑M = 51.50, SE = 2.09͒. The contribution of the speech contexts to target syllable categorization is reflected in this analysis by the significant Condition by Acoustic History interaction, F͑1,9͒ = 9.69, p = 0.01, p 2 = 0.518. With /al/ precursors, targets were somewhat more likely to be categorized as "ga" ͑M = 58.056, SE = 1.9͒ whereas with /ar/ precursors the same stimuli were less likely to be categorized as "ga" ͑M = 54.50, SE = 2.13͒. Thus, across conditions there is evidence of the joint influence of speech and nonspeech contexts. Moreover, the directionality of the observed effects is well-predicted by the spectral characteristics of the speech and nonspeech contexts.

C. Discussion
The percept created by the experiment 1 hybrid nonspeech/speech stimuli is one of rapidly presented tones preceding a bi-syllabic speech utterance. One could easily describe these nonspeech precursors as extraneous to the task of speech categorization and, indeed, listeners were not required to make any explicit judgments about them during the perceptual task. The task in this experiment was speech perception. Yet, even in these circumstances nonspeech contexts contributed to speech categorization. Speech does not appear to have a privileged status in producing context effects on speech categorization, even when afforded the benefit of temporal adjacency with the target of categorization.
Although general perceptual/cognitive accounts of speech perception are most consistent with these effects and can account for the directionality of the observed context effects, it is nonetheless surprising even from this theoretical perspective that the effect of nonspeech contexts is so robust.
The results run counter to modular accounts that would suggest that there are special-purpose mechanisms for processing speech that are informationally encapsulated and therefore impenetrable to influence by nonlinguistic information ͑Liberman et Liberman and Mattingly, 1985͒. The very simple sine-wave tones that comprised the nonspeech contexts are among the simplest of acoustic signals. To consider them information for speech perception by a speechspecific module would require a module so broadly tuned as to be indistinguishable from more interactive processing schemes. The results of Experiment 1 also are difficult to reconcile with a direct realist perspective on speech perception. The direct realist interpretation of the categorization patterns observed in the Speech Only condition is that the speech contexts provide information relevant to parsing the dynamics of articulation ͑Fowler, 1986; Fowler and Smith, 1986;Fowler et al., 2000͒. It is unclear from a direct realist perspective why, in the presence of clear speech contexts providing information about articulatory gestures, listeners would be influenced by nonspeech context sounds at all, let alone be more influenced by the nonspeech contexts than the speech contexts in the Conflicting condition. It does not appear that context must carry structured information about articulation to have an impact on speech processing.

III. EXPERIMENT 2
The stimuli created for Experiment 1 were constructed as a compromise among stimuli used in previous experiments investigating speech and nonspeech context effects. The /ga/ to /da/ speech target series of Holt ͑2005͒ was chosen for its naturalness in an effort to provide the most conservative estimate of context-dependence ͑synthesized or otherwise degraded speech signals are typically thought to be more susceptible to contextual influence͒. The synthetic /al/ and /ar/ contexts were taken from the stimulus materials of Lotto and Kluender ͑1998͒ because they produce a robust influence on speech categorization along a /ga/ to /da/ series ͑see also Stephens and Holt, 2003͒. Nonetheless, there are stimulus differences originating from the synthetic nature of the /al/ and /ar/ speech contexts of Experiment 1 and the more natural characteristics of the speech targets. This could lead the two sets of speech materials to be perceived as originating from different sources. If this was the case, the independence of the sources should reduce or eliminate articulatory gestural information relevant to compensating for intraspeaker effects of coarticulation ͑a within-speaker phe-nomenon͒ via gestural parsing. Although previous research has provided evidence of cross-speaker phonetic context effects ͑Lotto and Kluender, 1998͒, it may nonetheless be argued that Experiment 1 does not provide the most conservative test of nonspeech/speech context effects because of the possible perceived difference in speech source across syllables.
Therefore, Experiment 2 was conducted in the same manner as Experiment 1, but using natural /al/ and /ar/ productions recorded from the same speaker that produced the end point stimuli of the /ga/ to /da/ speech target stimulus series. The experiment thus serves as both a replication of the findings of Experiment 1 and an opportunity to investigate whether the influence of nonspeech context on speech categorization is robust enough to persist even when speech contexts and targets originate from the same source.

Participants
Ten adult monolingual English listeners, none of whom participated in Experiment 1, received a small payment or course credit for volunteering. All participants were recruited from the Carnegie Mellon University community and reported normal hearing.

Stimuli
Stimulus design was identical to that of Experiment 1, except that the speech context stimuli were digitally recorded ͑20-kHz sample rate, 16-bit resolution͒ natural utterances of /al/ and /ar/ spoken in isolation by the same speaker who recorded the natural speech end points of the target stimulus series. The 350-ms syllables were down-sampled to 10 kHz and matched in rms energy to the /da/ end point of the target stimulus series. These syllables served as the speech contexts in the stimulus construction protocol described for Experiment 1.

Design and Procedure
The design, procedure, and apparatus were identical to those of Experiment 1.

B. Results
The results of Experiment 2 are shown in the bottom row of Fig. 3. Marginal means are plotted in Fig. 4. Probit boundary values are presented in Table I.

Speech Only condition
Consistent with the findings of Experiment 1, there was a significant influence of preceding /al/ and /ar/ on speech target categorization. A 2 ϫ 9 ͑Contextϫ Target Speech Stimulus͒ repeated measures ANOVA confirmed that listeners categorized speech targets preceded by /al/ as "ga" significantly more often ͑M = 60.89, SE = 2.28͒ than the same targets following /ar/ ͑M = 51.00, SE = 3.4͒, F͑1,9͒ = 18.426, p = 0.002, p 2 = 0.672. Thus, natural /al/ and /ar/ recordings matched to the target source produced a significant context effect on categorization of the speech targets.
One potential concern about the use of synthesized speech contexts in Experiment 1 was that a perceived change in talker may have reduced observed effects of speech context. However, comparison of the influence of the synthesized versus naturally produced speech contexts on categorization of the speech targets with a cross-experiment 2 ϫ 2 ϫ 9 ͑Experimentϫ Contextϫ Target Speech Stimulus͒ mixed model ANOVA with Experiment as a between-subjects factor, did not reveal a significant difference in the context effects produced by the /al/ and /ar/ stimuli of Experiments 1 and 2, F͑1,18͒ = 2.88, p = 0.11, p 2 = 0.138.

Cooperating condition
The primary question of interest is whether nonspeech contexts influence speech categorization even in the presence of adjacent speech signals originating from the same source. A 2ϫ 9 ͑Contextϫ Target Speech Stimulus͒ repeated measures ANOVA supports what is illustrated in the bottom row of Fig. 2. There was a significant spectrally contrastive effect of the cooperating, spectrally matched, speech and nonspeech contexts, F͑1,9͒ = 76.21, p Ͻ 0.0001, p 2 = 0.894, such that listeners more often categorized speech targets as "ga" when High Mean nonspeech precursors and /al/ preceded them ͑M = 64.00, SE = 2.19͒ than when Low Mean nonspeech precursors and /ar/ preceded them ͑M = 50.56, SE = 2.54͒.
An additional 2 ϫ 2 ϫ 9 ͑Conditionϫ Contextϫ Target Speech Stimulus͒ repeated measures ANOVA examined the context effects across the Speech Only and Cooperating conditions of Experiment 2. Of note, although the mean difference between conditions was greater for the Cooperating condition ͑M ͑High+/al/͒-͑Low+/ar/͒ = 13.44% ͒ than the Speech Only condition ͑M /al/-/ar/ = 9.89% ͒, this difference was not statistically reliable, F͑1,9͒ = 2.52, p = 0.147, p 2 = 0.219. This differs from Experiment 1, for which speech and nonspeech contexts collaborated in the Cooperating condition to produce a greater effect of context on speech categorization than did the speech contexts alone.

Conflicting condition
An analogous analysis was conducted across the Speech Only and Conflicting conditions, revealing that the categorization patterns observed for the Conflicting condition were significantly different than those found for the Speech Only condition, F͑1,9͒ = 18.63, p = 0.002, p 2 = 0.674. A 2 ϫ 9 ͑Contextϫ Target Speech Stimulus͒ repeated measures ANOVA showed that, contrary to the robust effect of speech contexts in the Speech Only condition, there was no effect of hybrid nonspeech/speech contexts in the Conflicting condition, F͑1,9͒ = 3.29, p = 0.103, p 2 = 0.267 ͑M Low+/al/ = 57.00, SE = 2.52 vs M High+/ar/ = 59.11, SE = 2.52͒. The presence of spectrally mismatched nonspeech contexts effectively neutralized the influence of the natural speech precursors.

Comparing Cooperating and Conflicting conditions
A comparison of the patterns of categorization for the hybrid nonspeech/speech context conditions with a 2 ϫ 2 ϫ 9 ͑Conditionϫ Contextϫ Target Speech Stimulus͒ repeated measures ANOVA revealed a main effect of Context ͑entered into the analysis in terms of the nonspeech characteristics of the context͒ such that listeners more often labeled the speech targets as "ga" when the nonspeech context was drawn from a distribution with a High Mean frequency ͑M = 60.50, SE = 2.26͒ than when it was drawn from a distribution with a Low Mean frequency ͑M = 54.83, SE = 2.47͒, F͑1,9͒ = 25.61, p = 0.001, p sor syllable was /al/ ͑M = 61.56, SE = 2.25͒ than when it was /ar/ ͑M = 53.78, SE = 2.42͒. Thus, both speech and nonspeech contexts contributed to the categorization responses observed in the hybrid context conditions of Experiment 2.

C. Discussion
The overall pattern of results of Experiment 2 confirms that speech and nonspeech contexts jointly influenced speech categorization, even when the natural speech contexts were matched to the categorization targets in source. Of note, however, the influence of the nonspeech contexts in the presence of the natural speech contexts was less dramatic than were the effects observed when the same nonspeech contexts were paired with synthesized speech syllables in Experiment 1. Contrary to the findings of Experiment 1, the nonspeech precursors did not collaborate with the natural speech contexts of Experiment 2 to produce a context effect significantly greater than that elicited by the natural speech syllables alone. Moreover, although there was strong evidence of joint nonspeech/speech context effects in the Experiment 2 Conflicting condition, the influence of the nonspeech was not so strong as to overpower the natural speech context and reverse the observed context effect as it did in Experiment 1. These more modest patterns of interaction may be due to the somewhat stronger effect of context elicited by the natural speech syllables. This difference, evident in the shift in mean "ga" responses across speech contexts in the Speech Only conditions ͑the difference in mean percent "ga" responses for /al/ vs /ar/ contexts was 5.22% for Experiment 1 and 9.89% in Experiment 2͒ was not consistent enough to be statistically reliable across experiments. Nonetheless, the pattern of effect sizes suggests that the natural speech syllables may have contributed a greater overall influence to target speech categorization. This is simply to say that the speech contexts of Experiment 2 may have contributed more to the resulting target percept relative to the strong influence of the nonspeech contexts than did the synthesized syllables of Experiment 1.
To more closely examine this possibility, an additional statistical analysis was conducted to determine the relative contribution of speech contexts in the hybrid nonspeech/ speech conditions across experiments as speech context type ͑synthesized, natural͒ varied. A 2 ϫ 2 ϫ 2 ϫ 9 ͑Experiment ϫ Conditionϫ Contextϫ Target Speech Stimulus͒ mixed model ANOVA with Experiment as a between-subjects factor compared the relative influence of speech contexts in the Cooperating versus Conflicting conditions across experiments. A significant difference is reflected by the three-way Experimentϫ Conditionϫ Context interaction, F͑1,18͒ = 56.83, p Ͻ 0.0001, p 2 = 0.759. When nonspeech contexts were present, the relative influence of synthesized versus natural speech contexts differed. Computing the difference in mean "ga" responses in the hybrid nonspeech/speech conditions conditioned on the speech context illustrates why this is so. The categorization shift attributable to the synthesized speech contexts of Experiment 1 ͑M /al/ − M /ar/ = 58.06 − 54.50= 3.56͒ is significantly less than that of the natural speech contexts of Experiment 2 ͑M /al/ − M /ar/ = 61.56 − 53.78= 7.78͒. Many factors may have contributed to the relatively greater effect of context produced by the natural syllables including, but not limited to, the richer acoustic characteristics of natural speech, the closer spectral correspondence of the natural syllables with the target speech syllables, perception of the two syllables as originating from the same talker, amplitude relationships of the spectral energy from the two precursors, and auditory grouping by common acoustic characteristics. Whatever caused the natural syllables to be relatively stronger contributors to the effect on speech categorization, the results of the Conflicting condition nevertheless provide strong evidence of perceptual contributions from both nonspeech and speech contexts even for natural speech contexts. Moreover, the statistical analyses of the Experiment 2 Cooperating versus Conflicting conditions provide corroborating evidence that both the speech and nonspeech contexts contributed to the observed pattern of results.

IV. GENERAL DISCUSSION
A spectral contrast account of context-dependent speech perception makes strong directional predictions about context-dependent speech categorization in circumstances in which both speech and nonspeech contexts are present. Specifically, it is expected that the effect of joint speech/ nonspeech context on speech categorization will be dictated by the spectral characteristics of each source of context such that the speech and nonspeech contexts may either cooperate or conflict in their direction of influence on speech categorization as a function of how they are paired. The results of two experiments demonstrate that speech and nonspeech contexts do jointly influence speech categorization. When hybrid nonspeech/speech context stimuli were spectrally matched in Experiment 1, they collaborated to produce a bigger effect of context on speech categorization than did the same speech contexts on their own. A context effect on speech categorization was also observed in this condition in Experiment 2 ͑for which natural utterances provided speech context͒, but this effect was not significantly greater than that observed for the natural speech contexts alone.
When the spectra of the hybrid nonspeech/speech contexts were spectrally mismatched such that they predicted opposing influences on speech categorization, the observed context effects differed from the context effect produced independently by the speech contexts. In Experiment 1, the context effect observed in the Conflicting condition was of equal magnitude, but in the opposite direction of that observed for solitary speech contexts. The direction of the context effect was predicted, not by the adjacent speech contexts, but instead by the spectral characteristics of the temporally nonadjacent nonspeech contexts. A qualitatively similar, although less dramatic, effect was observed for the spectrally conflicting speech and nonspeech contexts of Experiment 2; the nonspeech contexts neutralized the effect of speech context such that no context-dependent shift in target speech categorization was observed. Overall, the effects observed for the hybrid context conditions of Experiment 2, with natural speech contexts matched in source to the target syllables, were relatively more modest than those observed in Experiment 1. This may have been due to the somewhat larger effect of context exerted by the natural speech contexts. Most important to the aims of the study, however, both experiments provided evidence that linguistic and nonlinguistic sounds jointly contribute to observed context effects on speech categorization. The sum of the results is consistent with general auditory/cognitive approaches with an emphasis on the shared characteristics of the acoustic signal and the general processing of these elements, in this case, spectral distributions of energy. The spectral characteristics of the context stimuli, whether the stimuli were speech or nonspeech, predicted the effects upon the speech categorization targets.
Mechanistically, an important issue that remains is whether such general auditory representations common to speech and nonspeech govern the joint effects of speech and nonspeech contexts on speech categorization or whether independent representations of the context stimuli exert an influence on speech categorization at a later decision stage. This is a thorny issue to resolve in any domain. Some theorists suggest that if processes share common resources or hardware, they can be expected to interfere or otherwise interact with one another whereas if they are distinct, they should not. The present results meet this criterion for indication of common resources or hardware, but further investigation will be required to hold this question to a strict test. Nevertheless, whether nonspeech contexts are operative on common representations or integrated at a decision stage, the information that is brought to bear on speech categorization is clearly not dependent on the signal carrying information about articulation per se. An account cognizant of the spectral distributions of acoustic energy possessed by the context stimuli, as postulated by a general auditory/cognitive account under the term spectral contrast makes the only clear predictions of what happens to speech categorization when speech and nonspeech are jointly present in the preceding input and these predictions are supported by the results.
With respect to spectral contrast, there is an element of these experiments that may seem puzzling. Considering that previous research has demonstrated that adjacent nonspeech context influences speech categorization ͑e.g., Lotto and Kluender, 1998͒, one may wonder why the nonspeech contexts of the present experiments exerted their influence on the nonadjacent speech targets rather than the adjacent speech contexts. To understand why this should be so, it is useful to think about speech categorization as drawing from multiple sources of information. 2 Context is merely one source of information; the acoustic signal corresponding to the target of perception is another. If the acoustic signal greatly favors one speech category alternative over another then context exerts very little effect. This is the case, for example, for the more limited effects of context that emerge ͑here, and in other experiments͒ at the end points of the target speech categorization stimulus series where acoustic information is unambiguous with respect to category membership. However, when acoustic signals are partially consistent with multiple speech categories context has a role in categorization. In the present experiments, the speech target syllables were acoustically manipulated to create a series varying perceptually from /ga/ to /da/. Thus, by their very design the intermediate stimuli along this series were acoustically ambiguous and partially consistent with both /ga/ and /da/. Context was thus afforded an opportunity to exert an influence. On the contrary, the acoustic structure of the speech context stimuli in the present experiments overwhelmingly favored either /al/ or /ar/; they were perceptually unambiguous and context therefore could exert little influence. The results of the present experiments demonstrate that when the speech contexts are acoustically unambiguous, they contribute to the effects of context rather than reflect the influence of the nonspeech precursors. Although it may seem surprising that the nonspeech context stimuli should influence perception of nonadjacent speech targets, recent research has demonstrated that the auditory -system is willing to accept context information as evidence by which to shift a categorization decision even when it occurs more than a second prior and even when multiple acoustic signals intervene ͑Holt, 2005͒. By these standards, the influence of the nonadjacency of the nonspeech contexts with the speech targets in the present experiments is relatively modest.
In sum, the joint influence of speech and nonspeech acoustic contexts on speech categorization can most simply be accounted for by postulating common general perceptual origins. Previous research has highlighted parallels between phonetic context effects and those observed between purely non-speech sounds ͑e.g., Diehl and Walsh, 1989͒, but these results have been challenged on the grounds that perception of nonspeech analogs to speech cannot be directly compared to speech perception, since speech has a clear, identifiable environmental source whereas nonspeech analogs to speech ͑pure tones, for example͒ do not ͑Fowler, 1990͒. A response to this challenge is that nonspeech contexts influence perception of speech ͑e.g., Lotto and Kluender, 1998͒. This is a stronger test in that it identifies the information sufficient for influencing speech categorization; when nonspeech stimuli model limited acoustic characteristics of the speech stimuli that produce context effects on speech targets these nonspeech sounds likewise elicit contexts effects on speech categorization. The present experiments introduce a new paradigm to test the joint effects of speech and nonspeech context stimuli on speech categorization. This paradigm is perhaps even stronger in that it allows investigation of the influence of nonspeech signals on speech categorization in the presence of speech context signals that also exert context effects. The present results demonstrate the utility of this tool in pursuing the theoretical question of how best to account for the basic representation and processing of speech. 1 Although dichotic presentation of single tones and speech targets has been shown to produce context effects ͑Lotto et al., 2003͒, investigation of the influence of multiple-tone acoustic history contexts on speech categorization under dichotic presentation conditions has not been reported to date. However, the long time course ͑Ͼ1 s͒ over which effects of tonal acoustic histories on speech categorization persist and the observation that tonal acoustic histories influence speech categorization even when as many as 13 neutral tones intervene between the acoustic history and speech target argue that central ͑i.e., not purely sensory͒ auditory mechanisms play an important role ͑Holt, 2005͒. 2 This analysis is consistent with the work of a rational Bayesian decision maker whereby the optimal policy is to combine information from different sources to assign posterior probabilities to possible interpretations of the input and choose the alternative with the highest posterior probability. This approach is amenable to speech perception in that stochastic versions of the TRACE model of speech perception ͑McClelland, 1991͒ implement optimal Bayesian inference ͑Movellan and McClelland, 2001͒. Moreover, recent theoretical discussions have highlighted how Bayesian analysis may be fruitfully applied to issues in speech perception ͑Geisler and Diehl, 2002