Cue weighting in auditory categorization: Implications for ﬁrst and second language acquisition a)

The ability to integrate and weight information across dimensions is central to perception and is particularly important for speech categorization. The present experiments investigate cue weighting by training participants to categorize sounds drawn from a two-dimensional acoustic space deﬁned by the center frequency (cid:1) CF (cid:2) and modulation frequency (cid:1) MF (cid:2) of frequency-modulated sine waves. These dimensions were psychophysically matched to be equally discriminable and, in the ﬁrst experiment, were equally informative for accurate categorization. Nevertheless, listeners’ category responses reﬂected a bias for use of CF. This bias remained even when the informativeness of CF was decreased by shifting distributions to create more overlap in CF. A reversal of weighting (cid:1) MF over CF (cid:2) was obtained when distribution variance was increased for CF. These results demonstrate that even when equally informative and discriminable, acoustic cues are not necessarily equally weighted in categorization; listeners exhibit biases when integrating multiple acoustic dimensions. Moreover, changes in weighting strategies can be affected by changes in input distribution parameters. This methodology provides potential insights into acquisition of speech sound categories, particularly second language categories. One implication is that ineffective cue weighting strategies for phonetic categories may be alleviated by manipulating variance of uninformative dimensions in training stimuli. © 2006 Acoustical Society of America. (cid:3) DOI: 10.1121/1.2188377


I. INTRODUCTION
Outside of the acoustic researcher's laboratory, few sound categories are distinguished by a single acoustic dimension. In the natural world, auditory categories more typically are defined along multiple dimensions. Integration of information across acoustic dimensions, then, must be a central characteristic of auditory processing.
Speech categories provide an excellent illustration of both the complexity with which multiple acoustic dimensions define auditory categories and of the adeptness of the auditory system at integrating information across multiple dimensions. For speech categories, rarely is a single acoustic dimension necessary or sufficient to define category membership; this is the essence of the classic "lack of invariance" issue ͑Liberman, 1996; Liberman et al., 1967͒. For example, there are many acoustic dimensions that contribute to voicing, as in the difference between English /ba/ and /pa/; Lisker ͑1986͒ cataloged as many as 16 acoustic dimensions that may characterize the English voicing distinction.
Nevertheless, for speech and other auditory categories the existence of multiple acoustic dimensions does not imply their perceptual equivalence. Simply put, some acoustic dimensions play a greater role in determining the perceptual identity of a sound than do others. The fact that acoustic dimensions need not contribute equivalently to category identity has been referred to as cue weighting. Acoustic dimensions appear to be perceptually weighted in the sense that some are strongly correlated to categorization responses whereas others, although present, weakly determine perceived category membership. As an example, both spectral and temporal acoustic cues differentiate English tense and lax vowels like /i/ versus /I/. Adult native American-English listeners, however, rely much more on the spectral dimension ͑formant frequency͒ than the temporal dimension ͑vowel du-ration͒ in categorizing /i/ and /I/ ͑e.g., Hillenbrand et al., 2000͒. Likewise, Francis and colleagues ͑2000͒ have demonstrated that although burst cues and formant transition cues co-vary for American English stop consonant categories, listeners rely on the formant cue significantly more than the burst cue in categorization responses ͑see also, Walley and Carrell, 1983͒. Even with explicit training with feedback to use the burst cue, the listeners continue to rely on the formant transition ͑Francis et al., 2000͒.
What determines the relative weighting of different sources of acoustic information? An adaptive listener would weight dimensions based on experience over time with the acoustic environment. For example, we might be able to pre-dict weighting functions for speech perception if we knew how acoustic dimensions co-varied with phonetic contrasts in a listener's experience. In principle, the regularity of information along different dimensions could be fully characterized if it were possible to make exhaustive acoustic measurements of the full range of speech productions for a language, an idiolect, an accent, a speaker, or group of speakers. In practice, this has been approached much more modestly by examining regularities in the realization of speech productions of small samples of speakers ͑e.g., Lisker and Abramson, 1964;Lotto et al., 2004͒. Nevertheless, even with this more limited approach, it is possible to estimate the distributions of phonetic categories across a subset of acoustic dimensions. Adaptive listeners will tune their weighting of the dimensions based on the characteristics of these distributions to maximize accuracy. However, there are a number of constraints related to sensory processing, cognitive processing, and previous experience that may prevent listeners from achieving ideal performance on any particular categorization task.
Examining this at a finer grain, cue weighting can be thought of as a function of at least four variables. The first two relate to the distributional characteristics of acoustic information and are language specific. First, acoustic dimensions vary in their informativeness for category identity; that is, there is a difference in the distinctiveness of the distributions for competing categories along each dimension. For example, voice onset time ͑VOT͒ is a robust acoustic signal for voicing across languages. Lisker and Abramson and others ͑Keating, 1984;Lisker and Abramson, 1964͒ have examined the distribution of VOT across speech categories in several languages. In American English, for example, VOT values produced by speakers are quite reliably differentiated across speech categories. This is to say that the distributions of the voicing categories do not overlap much along the VOT dimension. As a result, VOT is highly informative for the American English voicing distinction. Because of this informativeness, VOT is likely to be a heavily weighted cue in the perception of voicing categories by American English listeners. Note, as well, that implicit in this description is the hypothesis that listeners are sensitive, not just to the absolute value on an acoustic dimension for a particular stimulus presentation, but also to the distributional characteristics of that cue as it occurs across speech instances within a language.
A second parameter of auditory distributions that could affect perceptual cue weighting is distribution variance. Information is carried by variance in the signal and the auditory system seems to be especially sensitive to dimensions that are varying. Studies of nonspeech pattern analysis have established that components with greater relative variance receive greater perceptual weight ͑Lutfi, 1993; Lutfi and Doherty, 1994͒. From these findings, one may predict that dimensions on which speech distributions vary the most will receive higher perceptual weights. But in categorization tasks, such as phonetic categorization, the relationship between variance and cue weighting is likely to be more complex. The relationship between within-category and betweencategory variance is probably an important determinant of perceptual weight. Whereas large overall variance ͑disre-garding category͒ is indicative of an informative acoustic dimension, large within-category variance can decrease the informativeness of a dimension by creating distribution overlap ͑in the same way that variance can decrease dЈ in signaldetection theory͒. In addition, increased within-category variance across a dimension could mean that the dimension is not robustly related to the category or that there is a great deal of noise in the transmission of this dimension. In either case, the reliability of the dimension is questionable and, correspondingly, the dimension may be weighted less.
It is important to note that each of these distributional aspects of cue weighting must be considered to be language specific, dialect specific, and perhaps even speaker specific. Although the same acoustic dimensions may exist across languages or dialects, the phonetic distributions are clearly not equivalent. For example, across the voiced-voiceless distinction in American English stops, aspiration is correlated with voicing. Voiceless consonants ͑with longer VOTs͒ are typically aspirated whereas voiced consonants are much less likely to be produced with aspiration ͑e.g., Stevens, 1998͒. Aspiration is thus likely to be a relatively strong cue to voicing in American English. Aspiration also exists for Hindi stop consonant categories, but it does not carry weight as a cue for voicing because it is strongly correlated with an orthogonal category dimension ͑aspirated versus unaspirated, Benguerel and Bhatia, 1980͒. As another example, consider the relationship of vowel length to syllable-final consonant voicing. Adult native listeners of languages that do not have syllable-final obstruents or that fail to make a vowel-length distinction as a function of the voicing of the final consonant do not weight vowel duration as a cue to the finalconsonant's category identity as much as listeners of languages that make vowel-length distinctions as a function of final-consonant voicing ͑Crowther and Mann, 1992Mann, , 1994Flege and Wang, 1989͒. Listeners' weighting of acoustic dimensions, then, arises from experience with the regularities of acoustic realization in the native patterns of speech production, although the learning mechanisms remain, as yet, relatively poorly understood. At a somewhat finer level, perceptual weighting may have speaker-specific characteristics. Weighting of acoustic dimensions across individual speakers may vary somewhat and listeners' familiarity with this variation may contribute to voice recognition.
Two other variables that can influence cue weighting can be thought to be non-distributional and not language specific or speaker specific. First, the basic auditory representation of an acoustic dimension can influence its perceptual weight. Some acoustic dimensions are more robustly encoded by the auditory system than others. For example, there may be discontinuities in auditory processing along particular acoustic dimensions such that equal physical steps do not produce equivalent changes in percept ͑e.g., Kuhl and Miller, 1975;Stevens, 1989͒. VOT, again, serves as a fine example because its perception is thought to be influenced by a general auditory discontinuity in processing the onsets of simultaneous acoustic events ͑Kuhl and Miller, 1975;Pisoni, 1977͒. The acoustic dimension distinguishing VOT can be thought of as a relative onset asynchrony between two component frequencies, one higher and one lower in frequency. The lower-frequency component may either lead or lag the higher-frequency component, or it may occur simultaneously with it. Research from human adult, infant, and animal behavior as well as electrophysiology from these populations has suggested that the mammalian auditory system is poor at resolving onset asynchrony differences less than about 20 ms, whether they are speech or nonspeech stimuli ͑e.g., Jusczyk et al., 1980;Kuhl and Miller, 1975;Pisoni, 1977;Simos and Molfese, 1997;Simos et al., 1998: Sinex et al., 1991Steinschneider et al., 1999͒. Sinex and colleagues, for example, relate this auditory discontinuity to changes in the variance of the neural representation at the auditory nerve; there appears to be a relatively more robust neural code in the representation of stimuli with onset asynchronies greater than about 20 ms ͑Sinex and McDonald, 1989;Sinex et al., 1991͒. It has been suggested that languages may exploit this discontinuity in auditory processing by developing voicing categories that straddle this sensitivity and thus exaggerate the perceived difference between categories ͑e.g., Pisoni, 1977͒. Recent research has indicated that in addition to the perceptual benefit of exaggerated discriminability, such placement may also facilitate category learning ͑Holt et al., 2004͒. By these means, the relatively quantal representation for VOT ͑and consequent high discriminability of the category distinction͒ may contribute to a stronger cue weight. Of course, the opposite circumstances may also influence perceptual weighting; acoustic dimensions may be less heavily weighted as a consequence of a weak auditory representation, for example, if they are more susceptible to noise or are represented with more variability by the auditory system than other acoustic dimensions. Moreover, if the auditory system does not have sufficient resolving power or if there is too much noise in the representation of an acoustic dimension then the covariance between the dimension and the category label may be underestimated by the perceiver. In general, more discriminable acoustic dimensions will tend to be more informative for a listener and, correspondingly, should be expected to have a greater perceptual weight.
Cue weighting may also be influenced by task. The acoustic dimensions heavily weighted for phonetic categorization may be much less informative in making a lexical decision, identifying a talker, evaluating the emotional content of speech, or making other perceptual decisions. The same acoustic dimension present in the same signal may be heavily weighted for one perceptual task, but less heavily weighted for a different task. Likewise, changes in the task that affect the auditory representation ͑for example, the addition of low-frequency noise that selectively masks a particular acoustic dimension͒ or the informativeness of a cue ͑for example, manipulation of the distribution characteristics on an acoustic dimension in the laboratory͒ would also be expected to bring about shifts in listeners' perceptual cue weighting. This expectation of adaptive plasticity is particularly important for our understanding of how to alter cue weighting in aid of acquiring new categories, for example in the acquisition of second-language ͑L2͒ phonetic categories. The present experiments exploit these possibilities.
To summarize the above, not all available information is equivalent in perceptual processing and the differential con-tribution of acoustic dimensions to categorization arises as a result of experience with regularities in the input, the robustness or variability of the perceptual coding of an acoustic dimension, and its informativeness to category identity as a function of task. Cue weighting is a quantitative description of how auditory information ͑and information from other modalities as well͒ is integrated in perceptual categorization. An appreciation of perceptual cue weighting leads to the perspective that speech categorization is not just a matter of detecting available auditory cues along various acoustic dimensions, but also applying some weighting function that is, at least in part, dependent on experience with phonetic distributions.
This hypothesis has important implications for language acquisition. There has been considerable interest in the perceptual difficulties of L2 learners in acquiring secondlanguage speech categories. One underlying cause of these difficulties may be that well-established native-language weighting functions are inappropriate for the L2. The language-specific nature of cue weighting may create perceptual difficulties in integrating acoustic information for nonnative speech categories. Understanding the mechanisms of cue weighting and investigating the means by which it may be possible to change these functions are therefore of considerable importance for understanding L2 category acquisition. As an example, take the classic case of English syllableinitial /l/ versus /r/ perception by native Japanese listeners ͑Miyawaki et al., 1975͒. For native American English listeners, the starting frequency of the third formant ͑F3͒ carries most of the perceptual weight, but Japanese listeners appear to heavily weight F2 starting frequency in categorizing English /l/ and /r/ ͑Iverson et al., 2003;Yamada and Tohkura, 1990͒. This weighting is nonoptimal as F2 is not varied contrastively for syllable-initial liquids by English speakers ͑Lotto et al., 2004͒. An understanding of the basic processes by which acoustic dimensions are perceptually weighted and determination of the means by which these weights may be altered therefore are likely to have straightforward applications to understanding this classic L2 learning problem, and others.
Cue weighting also appears to be important for a full understanding of first language ͑L1͒ acquisition. Although infants begin to become perceptually tuned to the characteristics of L1 already in the first year ͑Jusczyk, 1997; Kuhl et al., 1992;Werker and Tees, 1984͒, this process ultimately takes significant time to develop to adultlike perception ͑e.g., Morrongiello et al., 1984;Nittrouer et al., 1998;Nittrouer et al., 2000;Nittrouer, 2004;Parnell and Amerman, 1978͒. Specifically, children appear to apply different weightings to acoustic dimensions in perceiving L1 speech categories ͑Hazan and Barrett, 2000; Mayo and Turk, 2004;Nittrouer, 2004͒. For example, in languages in which adults weight vowel duration as a strong cue in categorization of following final stop consonants, children tend to favor the consonantinfluenced vowel-formant transitions over vowel duration ͑Nittrouer, 2004͒.
The purpose of the present experiments is to address issues of cue weighting in an effort to begin to understand the means by which the auditory system weights acoustic dimensions and, also, to investigate methods by which listeners' cue weighting might be shifted. This latter aim is of particular importance in understanding the link between cue weighting and speech category acquisition in L1 and L2. It is presumed that cue weighting functions are determined in large part by the characteristics of distributions of experienced sounds. However, uncovering the role of experience in shaping speech categorization is quite difficult because it is impossible to know the precise history of speech experience that a listener brings to the laboratory. Therefore, as a starting point, we have taken the approach of investigating novel nonspeech auditory category learning. This allows us to maintain total control over listeners' histories of experience with the acoustic exemplars, thus ensuring that we have a precise characterization of the distributional characteristics. We can then describe the mapping between the informativeness of each dimension and the correlation of that dimension with categorization responses ͑i.e., its weighting͒. In the experiments that follow, we examine how changing the characteristics of these acoustic distributions or the manner of presentation can affect weighting functions. In each experiment, listeners are trained with feedback to categorize nonspeech sounds as coming from one of two predefined distributions. These microgenetic experiments were designed to examine auditory category formation with application to phonetic category acquisition. In fact, the characteristics of the distributions that were used were very similar to previous studies examining speech sound categorization in human adults, infants and non-human animals ͑Grieser and Kuhl, 1989;Kluender et al., 1998;Kuhl, 1991͒.

II. EXPERIMENT 1
In Experiment 1, listeners were trained to categorize unfamiliar sounds drawn from two distributions situated in a two-dimensional acoustic space. The two acoustic dimensions were equated for perceptual discriminability in pilot psychophysical experiments and possessed no known a priori relationship. Moreover, the input distributions were created so that the two dimensions were equally informative for the categorization task. The questions under investigation were whether listeners would integrate information from both dimensions in making category decisions and, if so, whether the resulting cue weighting functions were predictable from the ͑in this case equal͒ informativeness of the dimensions.

Participants
Fourteen volunteers recruited from Carnegie Mellon University, Pittsburgh, PA, participated for course credit or a small payment. All listeners reported normal hearing.

Stimuli
The experiment used methods from auditory categorylearning experiments ͑e.g., Holt et al., 2004;Mirman et al., 2004͒ whereby distributions of sounds drawn from a twodimensional acoustic space were sampled in stimulus presentation. Listeners labeled the sound and received feedback to learn to assign category labels during training. In a subsequent test, novel stimuli drawn from the same twodimensional acoustic space were presented to assess the relative weighting of the two acoustic dimensions as a consequence of categorization training. This design allows response patterns to be directly compared to distribution characteristics ͑e.g., means, variances, overlap͒.
The nonspeech stimuli used in all four experiments were frequency-modulated tones. The two-dimensional acoustic space from which stimuli were sampled was defined by center, or carrier, frequency ͑CF͒ and modulation frequency ͑MF͒. Within this acoustic space, two distributions corresponding to the to-be-learned auditory categories were created; they are illustrated in Fig. 1͑a͒ as open symbols. Fortyeight unique stimulus exemplars defined each of the categories; each category exemplar is represented by an open symbol in Fig. 1͑a͒, with circles marking one category and diamonds the other. Each stimulus was created from a sine wave tone with a particular CF modulated with a depth of 100 Hz at the corresponding MF. For example, if the CF was 760 Hz and the MF was 203 Hz, the tone was modulated from 710 to 810 Hz at a rate of 203 Hz. Each stimulus was 300 ms long and was sampled at 10 kHz with 16-bit resolution. Stimuli were created using the Cool Edit ͑Syntrillium, Inc., Phoenix, AZ͒ software.
It was desirable that sampling of the distributions along the two acoustic dimensions produce perceptual changes that were roughly equivalent. To this end, estimates of the justnoticeable-difference along the CF and MF dimensions were determined by pilot psychophysical studies in which one dimension ͑CF or MF͒ was varied while the other was held constant at an intermediate value. Based on informal listening, these tests began with minimum steps of 15 Hz along the CF dimension and 3 Hz along the MF dimension. Young, normal-hearing listeners ͑N =20͒ recruited from the same population as those participating in the reported categorization studies responded to pairs of stimuli varying in either CF or MF in an AX discrimination test with stimulus pairs separated by 1, 2, 3, 4, or 5 steps. This series of pilot discrimination studies revealed that the CF and MF dimensions were approximately equally discriminable ͑70% accuracy͒ when step size was adjusted to be 30 Hz along the CF dimension and 18 Hz along the MF dimension. The results of these studies also confirmed that discrimination was flat across each of the dimensions, indicating that there were no auditory discontinuities in discriminability across the region of the two-dimensional stimulus space under observation.
For Experiment 1, the positioning of the input distributions in the acoustic space resulted in the CF and MF acoustic cues being equally informative for the categorization task. Distribution overlap was equal on both dimensions. One way to quantify "informativeness" is to calculate the relative increase in accuracy over chance performance that an ideal observer could attain by using the dimension. For the current study, we used a simple criterion-bound model in which the ideal observer would place a criterion at the optimal ͑in terms of accuracy͒ position along the dimension. All stimuli on one side of the criterion are designated as category "A" and all on the other side are designated category "B." Any stimuli landing on the criterion are considered "A" 50% of the time. Unlike more sophisticated ideal observer models, there is no noise in the encoding of the value on the dimension and the response is not probabilistic. With this decision model, one can calculate the informativeness of a dimension as gives informativeness as the increase in accuracy over chance due to information in a dimension relative to the total amount of possible improvement. This function is equal to 1 when perfect performance can be achieved using that dimension and 0 when the dimension provides no information above chance. For Experiment 1, there are two possible responses and the prior probabilities of each response are equal; therefore, perfect performance is 100% and chance is 50%. By using either acoustic dimension alone, a perceiver could attain 95.8% correct with an optimal linear boundary ͓as can be seen in Fig. 1͑a͒, the two categories are nearly perfectly linearly separable along either acoustic dimension with optimal boundaries at CF= 865 Hz and MF = 140 Hz͔. Since the use of either acoustic dimension yields the same optimal percent correct, these acoustic dimensions are equally informative to the categorization task I CF = I MF ͑95.8− 50/ 100− 50͒ = 0.916. However, integrating information across the two acoustic dimensions is more informative to the categorization response than either dimension individually; the categorization task can be performed perfectly if an ideal observer integrates across both acoustic dimensions ͑i.e., I CF+MF = 1.0͒.
In addition to the 96 stimulus exemplars ͑open symbols͒ comprising the categorization training stimuli, 12 novel test stimuli ͑filled circles͒ were created to probe listeners' use of the CF and MF dimensions following training. These novel stimuli were reserved from categorization training to be used in generalization tests. They varied orthogonally in the acoustic space such that the value along one acoustic dimension was held constant at an intermediate value ͑CF = 865 Hz or MF= 140 Hz͒ while the values along the other acoustic dimension varied. 1 Acoustic presentation was controlled by TDT System II hardware ͑Tucker-Davis Technologies, Alachua, FL͒. Stimuli were converted from digital to analog, low-pass filtered at 4.8 kHz, amplified and presented diotically over linear headphones ͑Beyer DT-150͒ at approximately 70 dB SPL͑A͒.

Procedure
a. Categorization training. Listeners were tested individually in sound-attenuating booths. On each trial, listeners heard a single stimulus, pressed one of two unlabeled buttons to record a categorization response and received feedback as to the correct response via illumination of a light above the correct response button.
Listeners were instructed that the task was to determine whether each sound belonged to the left-most or the rightmost response button. The experimenter also informed listeners that, although they would need-to guess at first, they should attempt to use the feedback given on each trial to guide later responses. Participants were encouraged to attempt to get as many responses correct as possible. Feedback assignment followed the symbols of Fig. 1; stimuli represented as open circles were assigned to one category ͑one response button͒ whereas those illustrated as open diamonds were assigned to the opposite category ͑and button͒. Left versus right button assignment to stimulus distributions was counterbalanced across participants.
Each block of training included a single presentation of each of the 96 training stimuli. There were ten blocks of training overall and the order of stimulus presentation was randomized within each block. After each block, listeners were able to take a brief break.
b. Test. Following training, listeners completed five more blocks of the categorization task. Within a block, stimulus presentation was randomized. These blocks were identical to the previous training blocks, except that in addition to the original 96 training stimuli, novel generalization stimuli were introduced. On these trials, listeners did not receive informative feedback; lights above all of the buttons lit following the response. This approach was chosen so that listeners' categorization of the generalization stimuli could be observed without encouraging participants to learn labels for the generalization stimuli across the course of testing. Listeners continued to receive feedback on trials for which familiar training stimuli were presented to assure that they continued to assign category labels in a manner that respected the distribution characteristics and feedback assignments present in training. This model of feedback assignment and mixing of novel and familiar stimuli is identical to paradigms successfully used in studying nonhuman animal category learning ͑see, e.g., Kluender et al., 1998͒. Continuing to provide feedback for familiar stimuli, but not for novel stimuli, provides a safeguard against responses that drift away from learned categories as a consequence of the distribution differences that novel generalization stimuli introduce in the input. It also encourages participants to remain engaged in the task.
Participants completed both training and testing in a single experimental session lasting approximately 2 h.

B. Results and conclusions
The percent of "A" categorization responses was calculated whereby category "A" was arbitrarily assigned to be the category in the upper left corner of the two-dimensional acoustic space, indicated by open circles in Fig. 1͑a͒. Overall, listeners performed fairly well in learning to categorize these unfamiliar sounds. Across the ten categorization training blocks, listeners' mean accuracy ͑i.e., responses consistent with feedback assignments͒ was M = 87.25%, SE = 0.016. The orthogonal arrangement of the novel stimuli presented in the test blocks made it possible to assess categorization across one cue while the other was held constant. Figure 2 illustrates listeners' categorization responses and reaction times to the 12 generalization stimuli. The categorization functions averaged across listeners' responses to these novel stimuli hint that CF may have been more effective in cueing category identity because listeners were more adept at categorizing along the CF dimension than the MF dimension. An analysis of variance ͑ANOVA͒ of the percent "A" categorization responses as a function of the two orthogonal acoustic cues reveals that there was a statistically reliable Dimensionϫ Stimulus Step interaction for both categorization response, F͑5,65͒ = 8.45, p Ͻ 0.0001, p 2 = 0.394, and the reaction times, F͑5,65͒ = 2.77, p = 0.025, p 2 = 0.175. Cue weights were computed for each subject as the correlation between dimension values and percentage "A" responses across generalization stimuli. The absolute values of the correlation coefficients were normalized to sum to one. 2 These relative weights confirm the impression from the cat- egorization functions of the dominance of the CF cue. The average weights were 0.658 for the CF cue and 0.342 for MF. The CF weight was significantly different from 0.5, meaning that listeners did not weight the two cues equivalently as may be expected given the match on informativeness. 3 Two major conclusions may be made from the present study. First, listeners were able to learn to categorize novel acoustic stimuli effectively and did integrate information across the two dimensions ͑weights were above 0 for both dimensions͒. As such, this paradigm provides a useful means for examining cue weighting for categorization in a twodimensional acoustic space. Second, despite the fact that the discriminability of the exemplars was psychophysically matched across the acoustic dimensions and despite the equivalent informativeness of the acoustic dimensions to the categorization task, listeners weighted CF more heavily than MF.
Of the 14 subjects, only one weighted MF higher than CF. Two participants used CF almost exclusively whereas the majority relied on both cues to some extent but weighted CF higher than MF. The reason for the dominance of CF is not immediately clear; we suggest some possibilities in the Sec. VI. However, it provides us with an opportunity to examine how changes in the training distribution characteristics may encourage listeners to shift their weighting functions.

III. EXPERIMENT 2
The aim of Experiment 2 was to determine whether listeners would make greater use of the MF cue in categorization under circumstances in which CF was rendered less informative. To accomplish this, the input distributions employed for Experiment 1 were shifted toward one another along the CF dimension, which resulted in greater overlap of the distributions on this dimension.

Participants
Fifteen volunteers recruited from Carnegie Mellon University participated for course credit or a small payment. All listeners reported normal hearing.

Stimuli
The two-dimensional stimulus space constructed for Experiment 1 was used again in Experiment 2, but the input distributions defining the two categories were moved closer to one another along the CF dimension. The centroids of the training distributions in Experiment 1 had CF values of 760 and 970 Hz ͑differentiated by seven stimulus steps or 210 Hz͒, respectively. In Experiment 2, these values were 820 and 910 Hz ͑three stimulus steps, 90 Hz͒. As a result of this stimulus manipulation, the category input distributions overlapped more along the CF dimension than the MF dimension. In other words, the CF dimension became less informative. By the informativeness metric introduced in Experiment 1, I CF = 0.5 ͑75% correct with optimal boundary͒. All other distribution characteristics were identical to those of Experiment 1. Therefore, the informativeness of the MF dimension was unchanged from Experiment 1 ͑I MF = 0.916͒.
The new input distributions are illustrated as open symbols in Fig. 1͑b͒. The generalization stimuli, which were identical to the stimuli from Experiment 1, are illustrated as filled circles in Fig. 1͑b͒. To provide a finer sampling of categorization across the two cues, several additional stimuli were added to the novel stimulus set of Experiment 2; whereas there were 12 novel stimuli in Experiment 1, there were 17 in Experiment 2.

Procedure
The apparatus and procedures were identical to those used in Experiment 1.

B. Results and conclusions
Overall, listeners learned the categories very well, exhibiting a mean of 90.42% correct, SE = 1.46, across the ten training blocks. Figure 3 illustrates listeners' categorization responses to the 17 generalization stimuli. Again, listeners' responses to the novel stimuli were more categorical along the CF dimension than the MF dimension. Supporting this conclusion, there was a significant Dimensionϫ Stimulus Step interaction for both categorization responses, F͑8 , 112͒ = 31.83, p Ͻ 0.0001, p 2 = 0.695 and reaction times, F͑1 , 112͒ = 2.818, p = 0.007, p 2 = 0.168. The average cue weights for the two dimensions were 0.664 and 0.336 for CF and MF, respectively. The relative weighting for CF was statistically equivalent to that obtained for Experiment 1, t͑27͒ = 0.10, p = 0.92. Despite the reduced informativeness of CF for the task, listeners continued to rely on it in categorization judgments. Only two of the 15 subjects-gave greater weighting to the more informative cue ͑MF͒ and even in these cases the weights for CF and MF were approximately equal. Shifting the informativeness of the CF cue by moving the distribution means closer therefore was not effective in altering listeners' cue weighting.

IV. EXPERIMENT 3
The continued reliance on CF in Experiment 2 is surprising given that the observed pattern of cue weighting was a relatively inefficient weighting function for the training distributions. An obvious question is whether listeners' weighting functions are malleable at all. The goal of Experiment 3 was to present a categorization task that penalized a CFdominant weighting function more than did the tasks of the previous two experiments. To accomplish this, we changed the variance of the training distributions in addition to changing the position of the centroids, as in Experiment 2.

Participants
Fourteen volunteers recruited from Carnegie Mellon University participated for course credit or a small payment. All listeners reported normal hearing.

Stimuli
The training distributions for Experiment 3 are shown in Fig. 1͑c͒. The distribution centroids were identical to those of Experiment 2. In Experiments 1 and 2 the distributions spanned nine stimulus steps along each dimension. In Experiment 3, the variability along the CF dimension was increased ͑distributions spanned 15 stimulus steps͒ whereas the variability along the MF dimension was reduced ͑distributions spanned approximately five stimulus steps͒. The combined changes rendered MF a maximally informative cue for the categorization ͑I MF = 1.0͒ and CF a marginally informative cue ͑I CF = 0.442͒. The training distributions were composed of 86 unique stimuli. The generalization stimuli are illustrated as filled circles in Fig. 1͑c͒ and are identical to those of Experiment 2.

Procedure
The procedure and apparatus were identical to previous experiments.

B. Results and conclusions
Across the ten training blocks, listeners exhibited a mean categorization accuracy of M = 78.1%, SE = 3.38. The changes in distribution variances were effective in changing the preferred weighting of the dimensions. The average weight for CF was 0.395 with a corresponding MF weight of 0.65. The change in CF weight from Experiment 2 was significant, t͑27͒ = 4.41, p Ͻ 0.0005. Eleven of the 14 subjects weighted MF greater than CF.
The results of Experiment 3 make it clear that weighting functions can be changed by the characteristics of the training distributions during a short learning session. Of theoretical and practical interest is the determination of which distributional characteristics can effect change in weighting functions. Distribution variance appears to be a better predictor of relative weighting than are measures of central tendency or our measure of informativeness. Whereas the shift in distribution centroids in Experiment 2 resulted in no change in weighting functions despite a large change in informativeness of the CF cue, the manipulations of variability in Experiment 3 led to a reversal of relative weighting with a smaller change in informativeness of the cues.
From the classic perspective that variability is the essence of information in perception, the lower weighting of the cue with greater within-category variability appears to be counterintuitive. For example, in experiments on sample discrimination in which listeners are asked to listen and respond to tones in a target frequency region, irrelevant tones in distant non-target frequency regions have little effect if they are fixed in frequency. However, when these distracter tones are allowed to vary in frequency, listeners' performance on the target task suffers dramatically ͑Lutfi, 1992; Neff and Odgaard, 2004͒. This effect can be thought of as a capture of selective attention by the variability of the irrelevant tones or it can be conceptualized as a greater weighting of highly variable components in the task, as in Lutfi's ͑1993͒ CoRE model. In the current experiment, the listeners seemed to be sensitive to increased variability across the CF dimension, as it led to a significant change in weighting functions. However, it was the lower-variance MF dimension that received the greater perceptual weight.
The details of sampling discrimination experiments and the current categorization experiments are too different to make direct comparisons that may illuminate the reason for these different outcomes. However, the idea that the auditory system would be particularly sensitive to highly variable dimensions, ceteris paribus, is intuitively appealing. One may predict that exposure to variance along a dimension, with no attendant feedback, will result in this dimension becoming more heavily weighted. In Experiment 4 we investigated whether an acoustic dimension becomes more heavily weighted perceptually following exposure to its range of variability.

V. EXPERIMENT 4
Prior to categorization training in Experiment 4, listeners were exposed ͑without feedback͒ to stimuli varying along the MF dimension while CF was held constant at an intermediate value. It is predicted that this exposure independent of feedback for category membership will lead to a greater weighting of the MF dimension.

Participants
Fourteen volunteers recruited from Carnegie Mellon University participated for course credit or a small payment. All listeners reported normal hearing.

Stimuli
The distributions defining the categories were identical to those of Experiment 1. In addition, a set of generalization stimuli was created to more finely assess cue weighting across the acoustic space. The grid of filled symbols illustrated in Fig. 1͑d͒ shows the position of these stimuli in acoustic space. This grid of novel stimuli, along with the cross of novel stimuli shown in Fig. 1͑a͒, was presented to listeners in both Experiments 1 and 4.
In addition, another set of stimuli was created especially for Experiment 4. These stimuli were created as described for the previous experiments. They possessed a constant, intermediate CF frequency of 865 Hz and varied in MF from 5 to 275 Hz in 18-Hz steps. They are illustrated in Fig. 1͑e͒ by the large open circle symbols.

Procedure
The apparatus and procedure were identical to those of Experiment 1. However, Experiment 4 participants completed a brief passive-listening segment before entering into categorization training. This segment lasted approximately 15 min. In that time, listeners heard the 16 stimuli illustrated in Fig. 1͑e͒ as large open symbols 20 times each in random order. Listeners were instructed to simply listen to the sounds. Immediately after this passive exposure, listeners completed categorization training and generalization tests identical to those of Experiment 1. Thus, Experiments 1 and 4 differed only in the precategorization-training exposure.

B. Results and conclusions
A comparison of categorization accuracy during training across Experiments 1 and 4 reveals that the pretraining exposure to variability along the MF dimension had a signifi-cant influence on accuracy, t͑13͒ = 2.19, p Ͻ 0.05. Across the ten training blocks, Experiment 4 listeners were significantly more accurate ͑M = 92.74%, SE = 0.02͒ than Experiment 1 listeners ͑M = 87.25%, SE = 0.02͒. Figure 5 illustrates listeners' categorization of the generalization stimuli cross stimuli ͓see Fig. 1͑a͔͒ by acoustic dimension.
Calculations of cue weights also demonstrated that the passive exposure had an effect on categorization. Recall that Experiments 1 and 4 listeners also heard the grid of stimuli illustrated in Fig. 1͑d͒. A contour plot illustrating listeners' categorization responses to the grid of novel generalization stimuli is shown in Fig. 6. When relative cue weights are calculated from responses to this grid of stimuli, Experiment 1 listeners exhibited a greater reliance on the CF dimension ͑0.744͒ than they did on MF ͑0.256͒. As predicted, the rela- FIG. 6. Experiment 1 ͑no exposure to MF dimension͒ and Experiment 4 ͑exposure to MF dimension͒ categorization responses to generalization grid stimuli, as shown in Fig. 1͑d͒, plotted as percent "A" category responses. tive cue weight for CF was significantly lower for Experiment 4 ͑0.600͒ than for Experiment 1 ͑0.744͒, ͓t͑26͒ = 1.75, p Ͻ 0.05, one tailed͔. That is, preexposure to variability along the MF dimension with no feedback led listeners to rely upon MF in categorization responses moderately more than listeners who did not experience the preexposure ͑0.256 versus 0.400͒. These results suggest that the listeners were sensitive to the range of the MF dimension present in preexposure and treated it as a potential source of information. Note that this was a very conservative estimate of the effect of exposure to cue variability on cue weighting because the dependent variable was the categorization response to generalization stimuli presented at the end of categorization training. Still, a very short period of preexposure had an influence on cue weighting. It is quite possible that the categorization responses at the beginning of categorization training were more heavily weighted toward MF but that the preference for CF rebounded during training. The design of the current experiment makes it difficult to test this explicitly. However, the current evidence is suggestive of a role for unsupervised exposure with cue variability in determining weighting functions.

VI. GENERAL DISCUSSION
Although multiple cues define most auditory categories, including speech categories, how the auditory system integrates multiple acoustic dimensions in perceptual categorization is not yet well understood. The present work investigated cue weighting in auditory categorization, approaching the issue from the perspective that the perceptual system does not merely perceive acoustic dimensions and apply them as evidence for a particular category, but rather weights these dimensions as a function of characteristics of distributions of experienced sounds.
In the Introduction, we listed four variables that may affect the weighting functions for a categorization task. We summarize here the results of the experiments as they relate to these factors. The first of these potential factors was the informativeness of the cue for categorization. If the listener is weighting cues optimally then relative weightings should be predictable from relative informativeness ͑at least in terms of the rank ordering of the cues͒. The data from our experiments do not reveal such a direct relationship. In Experiment 1, CF received a much greater weighting despite the fact that the two cues were equal on our informativeness metric. CF continued to receive a higher weighting even when it was decreased in its informativeness to the categorization task in Experiment 2 by shifting the centroids of the distributions closer together on the CF dimension. We refer to a cue receiving a higher weight than dictated by its informativeness for the current task as having greater salience ͑cue preference or bias are other possible terms͒. Christensen and Humes ͑1996͒ also examined the categorization of nonspeech sounds differing on several dimensions and found that one of the cues tended to be weighted more despite equal informativeness for the task. However, their dimensions were not equated psychophysically for auditory step size. As a result, the weighting functions could have been reliably reflecting the informativeness of the psychoacoustic cues as opposed to the physical acoustic cues. In our experiments, the cueweights were computed across equated psychoacoustic scales and thus reflect perceptual cue weighting.
What accounts for this salience? One possibility is that in the natural acoustic environment, CF is a more informative cue to auditory category identity than is MF. That is, in identifying a sound, the carrier frequency may provide more information or may be more reliably related to the distal source than is the rate at which the sound is modulated. Whereas we have no direct evidence that this is the case, it does have intuitive appeal. If this is the case, then listeners may come into the experiment with experience that is relevant to the task. A default higher weighting for CF could be a result of an innate predisposition ͑because the informativeness of CF has led to a fixed adaptation͒ or learned through experience. Similar patterns of differences in cue salience have been witnessed in studies of cross-modal integration. Battaglia, Jacobs, and Aslin ͑2003͒ report that visual cues are weighted more than auditory cues for spatial location even when the visual cues are not as reliably related to spatial location ͑see also Ernst and Banks, 2002, for an example of visual cue salience over haptic cues͒. Battaglia et al.. were able to model this influence by proposing that perceivers had a priori expectations of the reliability of visual information for spatial location ͑defined as a prior probability distribution for visual cue variance͒. Thus, perceivers appear to be weighting suboptimally within the context of the information in an experiment, but they may perceive optimally within the larger context of overall perceptual experience. The interaction of short-term ͑or local͒ informativeness and long-term ͑or global͒ informativeness will be an important area for future investigation.
A second factor that may affect cue weighting is distributional variance. Because our definition of informativeness is based on a criterion-based decision, variance and informativeness are not equivalent. In Experiment 3, we took the distributions from Experiment 2 and increased the withincategory distribution variance along CF and decreased it along MF. The result was a small change in relative informativeness but a large change in the relative within-distribution variance for the two dimensions. Whereas the change in informativeness from Experiments 1 to 2 resulted in no change in perceptual cue weighting, the manipulation of withincategory distribution variances led to a substantial change in weighting, with MF now the dominant perceptual cue. Because of the design of the experiments, it is not possible to establish exactly what caused the change in perceptual weighting from Experiment 2 to 3. The change in withindistribution variances resulted in changes in informativeness, the ratios of within-to between-distribution variance and correlations with the feedback for each cue. Further studies will be necessary to manipulate these factors independently. What is clear from these experiments is that changes in the central tendencies of the distributions ͑with concomitant changes in informativeness͒ do not appear to be effective in changing perceptual weighting functions, whereas changes in within-category variance can lead to significant perceptual change. Another open question is the influence of the pres-ence of feedback on perceptual weighting. In Experiment 3, the CF dimension with greater within-category variance during training with feedback received the lower perceptual weighting. In Experiment 4, we presented the range of variability in MF with no feedback prior to the categorization training. This exposure resulted in a moderate increase in the relative weighting of the varied cue. 4 The present experiments do not allow us to fully dissociate the possible roles of variability along an acoustic cue with versus without explicit feedback for categorization; further studies will be necessary to investigate these manipulations independently. However, the present data are interesting in that they indicate that manipulation of distribution characteristics of acoustic categories plays a role in listeners'perceptual cue weighting.
A third factor that can affect cue weighting is how each dimension is represented by the auditory system. We attempted to account for some of the effects of auditory encoding by scaling our dimensions to equivalent auditory step sizes based on pilot testing of the stimuli. Whereas it is true that CF and MF are not independent in frequency modulation detection across a large range of values, we saw no evidence of interaction in our constrained ranges; we avoided very low MF values and created stimuli within a MF range where these dimensions are relatively independent ͑Demany and Semal, 1989͒. We did not model the effects of internal noise on the representation of our cue values. We correlated responses with the step size values of our stimuli as though these values are represented perfectly by the auditory system. The General Recognition Theory ͑GRT, Ashby and Townsend, 1986͒, a model of categorization developed mainly from visual categorization experiments, represents stimuli as multivariate probability distributions in perceptual space as opposed to points. Likewise, the COSS ͑conditioned-on-a-single-stimulus͒ analysis of perceptual weights developed for multicomponent psychoacoustic tasks ͑Berg, 1989͒ internal noise is an explicit parameter. It is likely that models of cue weighting in phonetic and other auditory categorization tasks will require a more detailed estimate of the auditory representation of the stimuli than we have used here.
The fourth potential factor affecting cue weighting is task. All of the experiments presented here were categorization tasks in which subjects were asked to maximize accuracy, so the present data do not speak to manipulating cue weighting through task adjustments. Nevertheless, task manipulations could be easily implemented in the present paradigm and would be particularly interesting to investigate in future research. In speech perception research, it is common to examine speech categories with categorization or discrimination paradigms. If one presumes that both paradigms access the same category representation, then they can provide converging evidence about auditory categories. However, if listeners modify their weighting functions to maximize performance given particular task constraints then discrimination and categorization data may not always coincide ͑e.g., Mirman et al., 2004͒.

A. Implications for L2 phonetic acquisition
In the Introduction, we discussed the proposal that one of the obstacles to acquiring L2 phonetic categories could be a mismatch between weighting functions appropriate for L1 and L2. For example, Lotto et al. ͑2004͒ estimated the phonetic distributions for syllable-initial /r/ and /l/ from native-English productions and determined that the optimal weighting strategy for categorization was a very heavy perceptual weight on F3-onset frequency and a much lower weight on F2-onset frequency. In a follow up, Lotto et al. demonstrated that the optimal weighting pattern for the Japanese distinction between /w/ and /r/ ͑a distinction often considered to interfere with acquisition of the English liquid contrast͒ was just the opposite, with a higher weighting of F2-onset frequency. Thus, one may predict that Japanese listeners would have difficulty with the English contrast ͑and vice versa͒ because of the mismatch in learned weighting functions. In fact, perception and production studies both indicate that native Japanese listeners weight F2-onset higher than F3-onset for the English contrast ͑Yamada and Tohkura, 1990;Iverson et al., 2003Iverson et al., , 2005Lotto et al., 2004͒. This inappropriate weighting of F2 resembles the salience of CF in the current studies. Given the similarity, our results detailing effective and ineffective means of shifting weighting functions may have relevance for training L2 learners. In particular, the lack of weighting shift between Experiments 1 and 2 suggests that changing the informativeness of cues by shifting the average cue values ͑e.g., enhancing the difference in F3-onset values or decreasing the differences in F2-onset for /l/-/r/ stimuli presented to Japanese listeners͒ may not result in a significant change in categorization. On the other hand, the reversal of the perceptually dominant cue from Experiments 2 to 3 demonstrates that adding variance to an over-utilized cue ͑e.g., allowing F2 onset to vary independent of whether the sound is /l/ or /r/͒ may be an effective strategy to change listeners' weighting functions. In addition, the effect of preexposure to cue variance in Experiment 4, emphasizes the notion that the proper manipulation of variance may be essential to appropriate category acquisition.
In some earlier attempts to train adults on non-native contrasts the training sets did not contain much variance ͑e.g., Strange and Jenkins, 1978͒. As a result, the trainees were able to learn to discriminate the training stimuli but were not able to transfer this learning to new stimuli and contexts. In more recent attempts at L2 training, it has become clear that added variance in the training set ͑e.g., more speakers, more phonetic contexts͒ aids learning and generalization ͑Bradlow et al Jamieson and Morosan, 1989;Lively et al., 1993͒. It is quite possible that a major benefit of high-variability training is that less informative cues will vary more within a category across multiple exemplars while more informative cues will be relatively more stable. As in Experiment 3, these changes in relative variance may be most effective at changing weighting functions. The importance of variance for category learning may even be seen in L1 acquisition. The speech that adults direct to their children ͑"motherese" or infant-directed speech͒ is actually more acoustically variable than speech directed to other adults ͑Kuhl et al., 1997͒. This seems counter intuitive unless one appreciates that infants must use the variance in speech input to determine which features are phonetically-relevant and which are not.

VII. SUMMARY
Of course, it is important to keep in mind that the categories of the present experiments were formed on the basis of an hour or so of experience whereas phonetic categories have the benefit of much greater experience, even among the youngest listeners. Nevertheless, we believe there is much to be gained for an understanding of phonetic cue weighting from laboratory studies that investigate cue weighting in general auditory ͑nonspeech͒ categorization. The purpose of the experiments presented here was to examine cue weighting in auditory categorization with well-defined training distributions for which experience could be entirely controlled. This allowed us to define the informativeness of each cue for the task and examine how changes in training distributions affect weighting functions. The long-term goals of this project are to establish a framework for testing models of cue weighting in phonetic categorization. It should be noted that there have been many models of perception, in general, and speech perception, in particular that have included cue weighting explicitly or implicitly. For example, Massaro's ͑1987; 1998͒ fuzzy logic model of perception ͑FLMP͒, Nosofsky's ͑1992͒ generalized context model ͑GCM͒, and connectionist models ͑e.g., TRACE, McClelland and Elman, 1986;Damper and Harnad, 2000͒ inherently weight cues in coming to a category decision. In order to provide a strong test of these models as a basis of speech perception, one needs good estimates of the distributions of phonetic category exemplars across the various dimensions. In addition to auditory cues, any full specification of phonetic categories would include visual cues, which can also be weighted in a categorization decision ͑Massaro, 1998͒. These kinds of estimates are rare.
We believe that nonspeech category tasks such as the ones presented here will play an important role in the development of models of auditory cue weighting. Whereas they do not have the ecological advantages of speech perception tasks, nonspeech tasks allow a level of stimulus and distribution control that allows one to investigate the constraints of the categorization process in detail.