Word Learning: Homophony and the Distribution of Learning Exemplars

ABSTRACT How do children infer the meaning of a word? Current accounts of word learning assume that children expect a word to map onto exactly one concept whose members form a coherent category. If this assumption was strictly true, children should infer that a homophone, such as “bat”, refers to a single superordinate category that encompasses both animal-bats and baseball-bats. The current study explores the situations that lead children to postulate that a single word form maps onto several distinct meanings, rather than a single superordinate meaning. Three experiments showed that adults and 5-year-old French children use information about the sampling of learning exemplars (and in particular the fact that they can be regrouped in two distinct clusters in conceptual space) to postulate homophony. This unexplored sensitivity and the very possibility of homophony are critically missing from current word learning accounts.


Introduction
To learn a word, language learners must draw a link in their mental lexicon between a phonological form and its meaning. While many words conform with a one-to-one mapping between form and meaning, this is not always the case: a homophone is a phonological form associated arbitrarily with several meanings, each of which corresponds to a concept. For instance, the word form "bat" applies both to the concept ANIMAL BAT and to the concept BASEBALL BAT. Hence, homophones present children with a non-standard word learning situation for which they need to discover that there is a decoupling between linguistic signals and concepts.
In order to examine what kind of challenge homophony brings into the word learning task, let us consider a typical word learning situation. Children do not observe associations between words and concepts; rather, they observe the co-occurrences of word forms and exemplars of their associated concepts. Thus, one major problem for the learner is to infer the meaning of the word from a set of exemplars that is consistent with an unbounded number of possible meanings (Quine, 1960). Existing theories of word learning have stressed the importance of prior knowledge to constrain the learning problem faced by the child (e.g., Bloom, 2001;Goodman, 1955;Markman, 1989). Such priors have been described at the level of concepts (what are the possible concepts our mind is ready to entertain) and at the level of word forms (what are the constraints on possible form-concept configurations).
All current accounts of word learning (associative learning accounts, e.g., Regier, 2005;Yu & Smith, 2007; hypothesis elimination accounts, e.g., Pinker, 1989;Siskind, 1996; Bayesian accounts, e.g., Frank, Goodman, & Tenenbaum, 2009;Piantadosi, Tenenbaum, & Goodman, 2012;Xu & Tenenbaum, 2007) assume that learners rest on two main assumptions: first, the structure of the Relevantly, it has been experimentally demonstrated that 3-to 9-year-old children have difficulty in learning homophones (Casenhiser, 2005;Doherty, 2004;Mazzocco, 1997). In particular, they find it more difficult to learn a second meaning for a word they know (thus a homophone, e.g., learning that the familiar word form "door" also labels an unfamiliar object) than to learn a completely novel word (e.g., learning that "blick" labels an unfamiliar object). This suggests that children are slower to learn secondary meanings of homophones than to learn novel words, consistent with the idea that children prefer to preserve a one-to-one mapping between forms and meanings. Yet another possibility is that homophone learning is difficult in these conditions because the homophones are chosen such that one meaning is already available to children and thus interferes with their ability to learn a novel meaning for the same word form. Indeed, even when both meanings of a pair of homophones are known to children before the experiment, they find it difficult to retrieve the less frequent meaning of the pair when the most frequent meaning is activated, but have less trouble to do so when the task provides greater contextual support for the less frequent interpretation (Beveridge & Marsh, 1991;Campbell & Bowe, 1977;Rabagliati, Pylkkänen, & Marcus, 2013). That is, in experiments where a second meaning is taught for a known word form, the competition between the known first meaning and the novel secondary meaning may mask children's ability to consider homophones as a possible option.
The present study addresses exactly this key problem by testing the simultaneous acquisition of two meanings for a single word form. When learning a homophone, such as "bat", learners will observe several exemplars of animal-bats and several exemplars of baseball-bats, all linked to the same word form "bat". In such a case, if learners hypothesize that this word form applies to a single, convex concept, as most word forms do, they would never discover homophony. Instead, they could consistently postulate that "bat" refers to some superordinate, coherent category encompassing both animal-bats and baseball-bats, just like a word like "thing" does. However, if "bat" was indeed linked to such a broad category, it is likely that learners would have observed many things that are called "bat" but are neither animal-bats nor baseball-bats (a uniform distribution of exemplars drawn from the superordinate category of "things") rather than having observed only animal-bats and baseballbats (a bimodal distribution of exemplars within the superordinate category).
We thus ask whether children capitalize on the sampling distribution of the learning exemplars to postulate homophony. To our knowledge, this is the first study that looks at the acquisition of multiple meanings for a new word by children. In Experiment 1, we combined two tests (inspired from Srinivasan & Snedeker, 2011;Xu & Tenenbaum, 2007) to replicate previous results with adult participants, circumscribing the situations in which homophony emerges. In Experiment 2, we exported our experimental procedure with children, showing that they also refrained from associating a label to a broad set of entities encompassing all learning exemplars when they form two distinct convex clusters. Experiment 3 provides a control with adults to discard the possibility of a superficial explanation for part of our effect. Altogether, our results suggest that children by the age of 5 use information about the sampling distribution of learning exemplars to discover whether a novel word form is associated with one or several meanings. Just like adults, children expect that words, but not word forms, refer to convex concepts and form lexical representations that follow this constraint, in essence showing early awareness that homophony is a possibility in natural languages.

Experiment 1
The experiment consisted of two testing phases: the extension test and the representation test. The extension test was similar to Xu and Tenenbaum (2007) and Dautriche and Chemla (submitted): participants were taught novels labels from an alien language (e.g., "blicket") for animal categories and were asked to extend this label to test items. We manipulated whether the set of exemplars they observed formed either a uniform or a bimodal distribution of the minimal superordinate category encompassing all the exemplars. If participants are sensitive to this sampling information, we predicted that they should be less likely to extend the label to all objects that are in the superordinate category when the exemplars form a bimodal distribution compared to when they form a uniform distribution.
During the representation test, we tested whether participants represented the meaning(s) of the word they have just been taught as two separate lexical entries (i.e., homophony) or as a single lexical entry. The procedure was similar to Srinivasan and Snedeker (2011): participants were taught that a subset of the examples previously shown were wrongly labeled and are in fact labeled by another word in that language (the corrected label, e.g., "these are not blickets, these are feps"). When the exemplars formed a bimodal distribution (as in the case of homophones, e.g., two animal-bats and two baseball-bats), the corrected label corresponded to one of the meanings of the initial word (e.g., "fep" labeled the two animal-bats). When the exemplars formed a uniform distribution (e.g., one animal-bat, one tree, one car, one baseball-bat), the corrected label applied to two of the exemplars (e.g., "fep" labeled one animal-bat and one car). Participants were then tested on their extension of the corrected label. If a bimodal distribution of exemplars is sufficient to trigger homophony, and assuming that participants can rely on the independence of the two meanings of a homophone, they should restrict the corrected label to the subcategory for which they have evidence (e.g., "fep" refers to animal-bats) and not extend it to the broader category (e.g., exclude baseball-bats). On the opposite, if participants readily extend the corrected label to the unattested meaning (or at least as much as in the uniform condition), this would suggest that they interpreted the initial word as referring to a broad category, and not as a homophone.

Participants
Nineteen adults were recruited from Amazon Mechanical Turk (6 Females; M = 37 years; all native speakers of English) and were compensated $0.4 for their participation. One additional participant was excluded because he did not provide any answers.
Procedure and display Adults were tested online. They saw the pictures of two aliens, one blue and one red, both coming from the same planet. They were instructed that they would be exposed to words from their language and would have to select images that correspond to those words. In the instructions, they saw an example of a trial with the pictures and the label used for the training trial.
The trials followed the time course schematically represented in Figure 1. In the learning phase, four learning exemplars were displayed as a combination of a picture and a prompt underneath each of them (e.g., "This is a blicket"), allegedly pronounced by the blue alien who was pictured at the bottom of the screen.
The extension test started as soon as adults pressed a "Go to test" button placed below the exemplars. Participants were presented with 12 test pictures displayed one-by-one below the 4 learning exemplars and asked whether this test item could be labeled by the novel word (e.g., "Is this also a blicket?"). Participants could answer by clicking on a "yes" or "no" button on the screen. When the response was "yes", the picture frame became green, if it was "no", the picture frame became red. Once the response was validated by participants by pressing a "Done" button, the test continued to the next test picture.
In the last two test trials, the extension test was followed by a representation test. On the left side of the screen, participants saw two of the four learning exemplars, highlighted in a green frame, with the red alien appearing with the prompt "What are you saying these are not blickets, these are feps!". 1 Once participants pressed the "I got that" button, the blue alien re-appeared on the right side of the screen recognizing his mistake and asking whether the corrected label could apply to three novel test pictures presented one-by-one (e.g., "Ooooh you are right! I made a mistake, Is this a fep too?"). Participants validated their answer by pressing a "Done" button before moving to the next test picture.
At the end of the experiment, there was a final questionnaire asking participants about their age, native language, and country.

Conditions
Each participant saw one training trial and four test trials: two uniform and two bimodal trials. For a simple and schematic explanation, we refer the reader to Figure 2 which represents the structure of test trials and introduces visually the associated terminology.

Training trial
The training trial was the same for all participants: the four learning exemplars were the pictures of four animals (a dog, a goat, a pig, and a cow) and the four test items were two animals (a cat, a horse) and two plants (a tree, a pumpkin). It was designed so that participants readily understand the task, inviting them to extend the novel label to the two animals but not to the two plants.

Test trial
The key factor differentiating the two test conditions (uniform and bimodal) concerns the distribution of the learning exemplars (LE 1 , LE 2 , LE 3 , LE 4 ) in conceptual space (here a tree-structure).
(1) In the uniform trials, the learning exemplars formed a uniform distribution sampled from a superordinate category such that all learning exemplars are about the same distance from one another. The mode of presentation differed between the two age groups: adults learned words through written prompts while children saw videos. The time course was the same for both groups. Participants first saw the four learning exemplars for the novel word presented by one of the aliens. In the extension test: participants then saw 12 test pictures presented one-by-one and were asked whether the image corresponds to the word just learned. In the last two trials, the extension test was followed by a representation test: first participants saw two of the learning exemplars being renamed with another label (the corrected label) by a second alien.
Then participants see another set of thre test pictures presented one-by-one and are asked whether each of them can also be named with the corrected label.
(2) In the bimodal trials, the learning exemplars formed a bimodal distribution sampled from two independent subcategories belonging to the superordinate category such that they formed two clusters of exemplars: (LE 1 , LE 2 ) and (LE 3 , LE 4 ).
During the extension test ( Figure 2.1), the 12 test items were either: (1) out: out of the superordinate category formed by the four exemplars (four items); (2) in: in one of the two subcategories (two items in between LE 1 and LE 2 and two in between LE 3 and LE 4 ); (3) in-superordinate: in the superordinate category but not in any subcategory (four items).
During the representation test ( Figure 2.2), another label (the corrected label) applied to two of the learning exemplars: LE 1 and LE 4 in the uniform trials and LE 1 and LE 2 in the bimodal trials. The three test items were either: (1) out: out of the superordinate category formed by the four exemplars of the initial word; (2) in (1,2) : in the subcategory formed by LE 1 and LE 2 ; (3) in (3,4) : in the subcategory formed by LE 3 and LE 4 .

Materials
Our stimuli relied on a set of to-be-learned labels and taxonomically organized objects.

Objects in conceptual space
Participants were tested on a set of 100 animals organized into a taxonomic hierarchy extracted from NCBI (http://www.ncbi.nlm.nih.gov) to obtain an objective measure of similarity between the different items as in (Dautriche & Chemla, submitted). 2 For each item, we selected three color photographs showing the animal in its natural background.  (1) and the representation test (2). The first row of pictures corresponds to the configuration of the learning exemplars (LE 1 , LE 2 , LE 3 , LE 4 ) in the bimodal condition and the second row to the configuration of the learning exemplars in the uniform condition. The third row corresponds to the test items.

Presentation and trial generation
The order of the trials as well as the pairing between the labels and the set of learning exemplars was fully randomized and differed for each participant. We created two lists of trials, such that each uniform trial had a corresponding bimodal trial in the other list. That is LE 1 , LE 3 and the test items were common between a pair made of a uniform and a bimodal trial, while LE 2 , LE 4 varied to make the trial uniform or bimodal. All trials were generated automatically following the algorithmic constraints described in the supplemental material and selected following pilot data on adults (the full list of trials is available in the supplemental material). Participants were randomly assigned to one of the two lists of trials.

Data analysis
Analyses were conducted using the lme4 package (Bates, Maechler, Bolker, & Walker, 2014) of R (R Core Team, 2013). In a mixed logit regression (Jaeger, 2008), we modeled the selection of a test item (coded as 0 or 1) independently for the extension test and the representation test. The extension test model included two categorical predictors with their interaction: Test Item (out, in, in-superordinate) and Sampling Condition (uniform vs. bimodal) as well as a random intercept and random slopes for both Test Item and Sampling Condition and their interaction for participants and trial pairs. The representation test model included as well two categorical predictors with their interaction: Test Item (out, in (1,2) , in (3,4) ) and Sampling Condition (uniform vs. bimodal) with a random intercept and random slopes for Sampling Condition for participants. 3 As can be seen in Figure 4, participants were at ceiling in selecting in (1,2) in the bimodal condition. This impacted the log-estimation behind the logit model for the representation test, such that the estimates and the standard errors calculated by the model were implausibly big. To get rid of these ceiling effects in a highly conservative way we introduced random noise: we run the same analysis on a modified dataset where we changed randomly 10% of the responses given for in (1,2) in the bimodal condition (2 responses). Figure 3 reports the average proportion of selection of each test item by sampling condition (uniform vs. bimodal) during the extension test.

Extension test
Participants were sensitive to the distribution of the learning exemplars: they selected more insuperordinate items in the uniform condition than in the bimodal condition (M uniform = 0.68, . We note that the distribution of the learning exemplars also affected the selection rate of out items: they selected more out items in the uniform compared to the bimodal condition (β = 1.79, z = 2.03, p < .05). Yet, this is expected following the size principle documented by Xu and Tenenbaum (2007): the boundaries of the superordinate category defined by the four learning exemplars in the uniform condition are less sharp (for an equal number of exemplars) than the boundaries of the subcategories in the bimodal condition. As a result, there is more uncertainty about the correct level of generalization in the treestructured hierarchy leading to a slightly higher selection rate of out items in the uniform condition. However the sampling distribution of the exemplars modulated participants' responses beyond the size principle effect since we observe two interaction effects: the difference between the selection rate of insuperordinate items and in items was greater in the bimodal than in the uniform condition (β = 5.70, z = 3.23, p < .01); similarly, the difference between the selection rate of in-superordinate items and out items was marginally smaller in the bimodal than in the uniform condition (β = − 1.58, z = − 1.67, p < .1). This suggests that participants were more inclined to extend the label to all objects in the superordinate category including all the exemplars when the exemplars formed a uniform distribution than when they formed a bimodal distribution. Figure 4 reports the average proportion of selection of each test item by trial condition (uniform vs. bimodal) during the representation test.

Representation test
Crucially, the distribution of the exemplars influenced participants' representation of the initial word: in the bimodal condition, participants were less likely to extend the corrected label to unattested items (in (3,4) ) than in the uniform condition (M uniform = 0.53, SE = 0.12; M bimodal = 0.16, SE = 0.08; β = − 5.05, z = − 3, p < .01). This is compatible with a homophonous representation of the initial word in the bimodal condition, where only one of the two meanings (in (3,4) ) has been affected by the later correction.

Discussion
The sampling distribution of the exemplars modulated adults' interpretation of a novel word: when the exemplars formed a uniform distribution, participants were more likely to extend its label to all objects falling in the superordinate category containing all the exemplars than when the exemplars formed a bimodal distribution (extension test). in (1,2) in (3,4) out in (1,2) in (3,4)   Yet, one may worry that participants did not form a lexical representation but rather extended the label based on similarity to the learning exemplars: they selected more in-superordinate items in the uniform than in the bimodal condition simply because, on average, in-superordinate items may be closer to the learning exemplars in the uniform than in the bimodal condition. However, that's not the case: recall that in our similarity space (Figure 2), the common ancestor of in-superordinate items with any pair of learning exemplars, i.e., (LE 1 , LE 2 ) or (LE 3 , LE 4 ), is the same in the uniform and the bimodal condition. Certainly participants' responses were in part guided by similarity of the test items to the learning exemplars: in the bimodal condition during the extension test, adults selected in-superordinate items at a higher rate than out items (β = − 2.30, z = − 1.99, p < .05). Yet it is sufficient for our purpose to note that this does not account for the entirety of our effect. 4 An interesting question is whether observing the exemplars in a bimodal distribution was sufficient for participants to form homophonous form-meaning representations. Indeed, there may be two interpretations of the results of the extension test.
(1) Participants formed words' representations that respect concept convexity. In the bimodal condition, participants postulated homophony: they associated the novel word with two independent meanings, each corresponding to a convex concept (e.g., PRIMATE and SNAKE). (2) Participants accepted that a word's meaning could be a set of disconnected concepts: in the bimodal condition they associated the novel word to a single, disjoint concept (e.g., PRIMATE or SNAKE).
The results of the representation test favor the first possibility. When presented with a bimodal distribution of exemplars (e.g., two primates and two snakes) labeled by a single word "blicket", participants interpreted the corrected label based on its taught meaning alone (e.g., the two snakes but not the two primates) suggesting that they preferentially understood "blicket" as a word form associated with two homophonic words, rather than as a single word with a single discontinuous meaning.
All in all, we replicate previous results (Dautriche & Chemla, submitted) showing that the distribution of learning exemplars interacts with constraints on concept convexity to form different form-meaning representations. When the exemplars form a uniform distribution, participants are more likely to associate the word to a single convex meaning that encompasses all the learning exemplars (and every entity in between them). Yet, when the exemplars form a bimodal distribution, participants prefer to postulate homophony such that the novel word is associated to two convex meanings, rather than to a single, broad convex meaning or to a single discontinuous meaning. The critical question then, is, do children also postulate homophony when there is evidence that the exemplars of a word are distributed in two convex clusters in conceptual space? Experiment 2 investigated this question by adapting the design of Experiment 1 to French preschoolers.

Experiment 2
Method Participants Twenty-one 5-year-old monolingual French speaking children (5;1 to 6;1, M age = 5;6, 10 girls) were tested in a public preschool in Paris. Their parents signed an informed consent form. Three additional children were tested but not included in the analysis because they systematically responded yes (n = 1) or no (n = 2) without even looking at the test pictures or responding before they appeared on the screen.

Procedure and display
The experiment was identical to Experiment 1 except that we used videos instead of written prompts with children (see Figure 1). Children were tested individually in a quiet room in their preschool. During the experiment, children sat next to the experimenter, in front of a computer and wore headphones to listen to the stimuli. Before the experiment began, children watched a video where two alien puppets introduced them to the task. The two puppets presented themselves as Boba and Zap, and told the children that they were coming from another planet where they speak a different language, so they would teach the children words from their language. Once a child demonstrated to the experimenter that (s)he understood the task, the experiment started. In each trial, children saw four learning exemplars, presented one-by-one as the combination of a picture and a video of Boba labeling the picture with a non-word "Ça, on appelle ça une bamoule!" 'This, we call it a bamoule.' Each learning exemplar was displayed on the screen when the experimenter clicked on a button. Once children saw all four learning exemplars, they saw a final video where Boba asked them to repeat the word. This was to ensure that children were on task and for the experimenter to know which word was used by the puppet (as the words were randomly assigned to a set of learning exemplars and the experimenter could not hear the stimuli).
During the extension test, children were presented with the 12 test pictures displayed one-by-one below the four learning exemplars. For each of them, the experimenter asked: "Est ce que tu penses que ça s'appelle une bamoule?" 'Do you think it is called a bamoule?'. When the child answered, the experimenter clicked accordingly on the "yes" or "no" button on the screen. Children could change their mind if they wanted within a few seconds after their answer or longer if the experimenter saw that they were still hesitating. Once a response was validated by the experimenter, the test continued to the next test picture.
In the last two test trials, the extension test was followed by a representation test. Children saw two of the four learning exemplars grouped in a frame together with a video where the second puppet, Zap, scolded the first one, Boba, for using the wrong word for the two exemplars displayed "C'est pas des bamoules ça! C'est des torbas!" 'These are not bamoules! These are torbas!' (the whole script for this video can be found in the supplemental material). Boba then acknowledged his mistake. During the dialogue, the frame of the pictures was blinking in green. At the end of the video, the experimenter asked the child what happened and replayed the video if the child did not understand the video or did not remember the novel word. Children were then tested whether the corrected label could apply to three novel test pictures. Each of the test pictures was displayed below the two learning exemplars and the child was asked by the experimenter "Est ce que tu penses que ça, ça s'appelle un torba?" 'Do you think that this is called a torba?' At the end of the experiment, there was a final video where the two puppets said good-bye to the child. The whole experiment lasted about 15 min. All sessions were audiotaped.
Conditions, materials, presentation, trial generation, and data analysis Everything was similar to Experiment 1, except that the experiment was in French and hence the set of non-words consisted of nine phonotactically legal non-words of French: "bamoule" was always used in the training trial. From the remaining eight non-words that were used in the test trials, half of them were bisyllabic and were used as the initial label ("toupa", "fimo", "lagui", "yoshi"), the other half were trisyllabic and were used as the corrected labels ("midori", "cramoucho", "didolu", "baboucha"). The difference in the number of syllables was introduced to make it easier for children to distinguish between the initial and the corrected labels.

Extension test
As shown in Figure 5, children and adults behaved in the same way: children selected more insuperordinate items in the uniform condition than in the bimodal condition (M uniform = 0.61, SE = 0.04; M bimodal = 0.28, SE = 0.04; β = − 2.56, z = − 3.73, p < .001). The distribution of the learning exemplars did not affect any other test items for children (all ps > 0.7) resulting in two interaction effects: the difference between the selection rate of in-superordinate items and in items, was greater in the bimodal than in the uniform condition (β = 2.79, z = 2.30, p < .05); similarly, the difference between the selection rate of in-superordinate items and out items, was smaller in the bimodal than in the uniform condition (β = − 3.28, z = − 4, p < .001). This suggests that children were more likely to extend the label to all objects in the minimal subtree containing all the exemplars in the uniform condition compared to the bimodal condition.
For uniform trials, we replicate previous results (Xu & Tenenbaum, 2007): children were more likely to extend the label to all objects in the minimal subtree containing all the exemplars than to objects that are out of the subtree (in vs. out, β = − 5.39, z = − 6.28, p < .001; in-superordinate vs. out, β = − 3.01, z = − 3.82, p < .001). We note that the distance of the test items to the learning exemplars affected children's responses: they selected more in items than in-superordinate items (β = − 2.29, z = − 2.80, p < .01) suggesting that children's extension of the label may be in part guided by similarity of the test items to the learning exemplars.
For bimodal trials, children were more likely to extend the label to objects that were in one of the two subcategories (in) than to other objects that were either in the superordinate category containing all the exemplars but out of the subcategories (in-superordinate, β = − 5.12, z = − 4.61, p < .001) or out of the superordinate category (out, β = − 4.85, z = − 8.01, p < .001). There was no difference between the selection rate of in-superordinate items and out items for children (p > 0.7) suggesting that children excluded in-superordinate items from the extension of the word.
While children and adults behaved statistically the same way (p > 0.2), there were some visible differences. In particular, there was a main effect of Sampling Condition for children: children were more likely to select test items in the uniform than in the bimodal condition (χ(1) = 8.20, p < .01). They even selected more out items in the uniform than in the bimodal condition (β = 2.56, z = 2.20, p < .05; no such difference was observed for adults, p > 0.2). The selection rate of out items in the uniform condition is similar to their selection rate during the extension test. One may wonder why children selected it even less in the bimodal condition during the representation test. Children may be driven by a "size principle" (Xu & Tenenbaum, 2007): what determines whether the hypothesized extension of a label will have sharp boundaries is the number of consistent hypotheses with the exemplars. In the representation test, the category defined by the two exemplars is rather narrow (e.g., snakes) ensuring that very few meaning hypotheses are possible for the corrected word. During the extension test, there were more meaning hypotheses possible for the initial word: despite the fact that two exemplars were presented for each of the two subcategories (e.g., INSECTS and PRIMATES), the possibility that the initial word corresponded to the minimal superordinate category encompassing all the exemplars could still be entertained leading to a bigger uncertainty about the boundaries of the categories of the initial word compared to the boundaries of the corrected word.

Discussion
These results suggest that children, just like adults, postulate homophony for a word when the learning exemplars formed a bimodal distribution, i.e., that the meaning of that word form is best represented as two independent convex clusters rather than a big cluster encompassing all the exemplars. At this point, we would like to point out a weakness in the second of our tests, the representation test. Because the category including the re-labelled exemplars is wider in the uniform case (e.g., a snake and a mouse) compared to the bimodal case (e.g., two different kinds of snakes), this may be sufficient for children to extend the corrected label to more test items in the uniform compared to the bimodal condition: indeed, a chimp is more likely to be a "fep" when "fep" labels a snake and a mouse than when "fep" labels two snakes. This could happen independently of what children had learned about the meaning of the initial word "blicket". Thus, although our results reflect exactly what we could expect to observe if children had hypothesized that "blicket" was a homophone when observing a bimodal distribution of its exemplars, we cannot entirely rule out the possibility that children, and adults, responded to the representation test independently from the extension test, in which case the representation test would tell us nothing about what had been learned initially. To test whether participants responded to the representation test independently from the extension test, i.e., whether participants' extension of the corrected label was independent from their representation of the initial word, we ran a control experiment with adults, in which we tested both participants' in (1,2) in (3,4) out in (1,2) in (3,4) out

Selection rate
Representation test (children) Figure 6. Proportion of choice of each test item during the representation test averaged for each trial condition (uniform vs. bimodal). Error bars indicate standard errors of the mean.
representation of the initial word not only by testing their extension of the corrected label ("fep") but also by testing their updated extension of the initial label ("blicket").

Experiment 3
In Experiment 3, adults learned a novel word, e.g., "blicket" (the initial label), from a set of exemplars, and were instructed that some of the exemplars were not "blickets" but "feps" (the corrected label). When "blicket" applies to a bimodal distribution of exemplars (e.g., two animal-bats and two baseball-bats) and "fep" corresponds to one of the meanings of the initial word (e.g., the two animalbats), if participants recruit what they have learned from "blicket", they should restrict not only the corrected label to the subcategory for which they have evidence (e.g., "fep" refers to animal-bats and not to baseball-bats) as in Experiment 1, but they should also update their representation of "blicket" (e.g., "blicket" now refers to baseball-bats and not to animal-bats). On the opposite, if they readily extend "blicket" to animal-bat items, although these items are now labeled as "feps", then this will suggest that their responses for "fep" and "blicket" are independent. When "blicket" applies to a uniform distribution of exemplars (e.g., one animal-bat, one tree, one car, one baseball-bat) and the corrected label applies to two of the exemplars (e.g., "fep" labeled one animal-bat and one car), if participants recruit their lexical representation of "blicket" during the representation test, they should, as in Experiment 1, be more willing to extend "fep" to all "blickets" (e.g., the broader category of animal-bat, tree, car and baseball-bat) but crucially they should also update their lexical representation for "blicket" (e.g., excluding now the broader category of animal-bat, tree, car, and baseball-bat from it). On the contrary, if participants' lexical representation of "blicket" is unaltered, this would suggest that participants have independent lexical representations for "fep" and "blicket".
In summary, for both conditions if participants' extension of the corrected label, "fep", is uninformed by what they learned about "blickets", then their representation of "blicket" should be untouched. On the other hand, if participants recruited their lexical representation of "blicket" during the representation test, then they should update this lexical representation by excluding all the "feps" from it.

Participants
Twenty adults were recruited from Amazon Mechanical Turk (8 Females; M = 33 years; 19 native speakers of English) and were compensated $0.4 for their participation.

Procedure and display
The experiment was identical to the adult version in Experiment 1 except that during the representation test we tested their representation of the corrected word (as in Experiment 1) but also their representation of the initial word.
During the representation test, as before, on the left side of the screen the red alien labeled two of the four learning exemplars by another word: "What are you saying these are not blickets, these are feps!" On the right side of the screen, the blue alien seen during the extension text appeared with a test item and two prompts: (1) "Ooooh you are right! I made a mistake, Is this a fep too?" (representation test for the corrected word); and below it (2) "Is this a blicket?" (representation test for the initial word). There was a "yes" or "no" button below each question to record participants' answer.
Conditions, materials, presentation and trial generation, data analysis Similar to Experiment 1. Figure 7 reports the average proportion of selection of each test item by trial condition (uniform vs. bimodal) during the extension test.

Extension test
The extension test replicated the results of Experiment 1: participants' responses were modulated by the distribution of the learning exemplars: participants chose more in-superordinate items in the uniform than in the bimodal condition (M uniform = 0.75, SE = 0.04; M bimodal = 0.34, SE = 0.03; β = − 6.31,z = − 3.57, p < .001). The sampling distribution did not affect the choice of any other test items (ps > 0.1). As a result, the difference between the selection rate of in-superordinate and in items was greater in the uniform than in the bimodal condition (β = 3.68,z = 1.98, p < .05) and the difference between the selection rate of in-superordinate and out items was smaller in the bimodal condition compared to the uniform condition (β = − 5.33, z = − 2.51, p < .05). Figure 8 reports the average proportion of selection of each test item by trial condition (uniform vs. bimodal) during the representation test for the corrected label. This test also replicated the results of Experiment 1: in the bimodal condition, participants were less likely to extend the corrected label to unattested items (in (3,4) ) than in the uniform condition (M unjform = 0.36, SE = 0.10; M bimodal = 0.1, SE = 0.06; β = − 2.26, z = − 2.19, p < .05). The sampling distribution of the exemplars did not affect the responses to other test items (ps > 0. 5) As shown in Figure 9, when tested on the initial label, participants selected more in (3,4) in the bimodal than in the uniform condition (M unjform = 0.5, SE = 0.11; M bimodal = 0.86, SE = 0.08; β = 2.28,z = 2.70, p < .01). This suggests that the sampling distribution of the exemplars affected participants' representation of both the corrected and the initial words. The sampling distribution did not affect the response of any other test items (ps > 0.1).

Representation test
In the uniform condition, participants refrained from applying the initial label to any of the test items that were within the superordinate category formed by the 4 exemplars (in (1,2) and in (3,4) ). This was evidenced by the fact that the selection rate of in (1,2) and in (3,4) as representatives of "blickets" was not different from their choice of the out items (ps > 0.15). This suggests that the corrected word overrode the representation of the initial word: all items previously labeled as "blickets" are now "feps", thus "blicket" is not associated with any meaning. In the bimodal condition, participants refrained from applying the initial label to test items that belonged to the subcategory of items that had been relabeled with the corrected label: they did not choose in (1,2) as representatives of the initial label more often than out items (p > 0.2); however, they readily extended the initial label to test items that were outside the subcategory of relabeled items (in (3,4) vs. out: β = − 4.52, z = − 4.46, p < .001). In other words, when exposed to two primates and two snakes all labeled "blicket", if the two primates are relabeled "feps", participants still consider that the new instances of snakes are valid instances of "blicket".

Discussion
When the initial word was corrected by another word (e.g., "these are not blickets, these are feps"), participants took into account what they have learned about the initial word ("blicket") to comprehend the extension of the corrected word. We replicated the results from Experiment 1: In the uniform in (1,2) in (3,4) out in (1,2) in (3,4) out

Selection rate
Representation test (initial label) Figure 9. Proportion of choice of each test item during the representation test averaged for each trial condition (uniform vs. bimodal) when participants were tested on the initial label. Error bars indicate standard errors of the mean.
condition, participants were more likely to associate the corrected word to the superordinate category that spans all 4 learning exemplars than in the bimodal condition. In addition, participants also updated their representation of the initial word. Specifically, in the uniform condition, participants refrained to associate the initial label to all items that were previously labelled by it (as all are now in the extension of the corrected word, "fep"). Yet, when the exemplars formed a bimodal distribution, i.e. were taken from two clusters of exemplars C 1 and C 2 such that the corrected label applied only to one of the clusters of exemplars C 1 , participants still considered test items belonging to C 2 as valid instances of the initial label. This suggests that adults in Experiment 1 recruited the representation of the initial word during the representation test. However, the alternative explanation, i.e., that the representation of the corrected label "fep" is computed on the basis of the learning exemplars for "fep" only (independently of the representation of the initial label "blicket"), cannot be entirely ruled out. Participants could have extended the corrected label "fep" according to the distribution of its exemplars in the representation test (i.e., extend it to more test items in the uniform condition than in the bimodal condition simply because the category covered by the two exemplars in the representation test is wider in the uniform case) and then used this information to decide how "blicket" should be extended. Clearly, if participants decide that a given item I is an instance of "fep", they will be more willing to exclude it from the extension of "blicket" and this independently of how they decided that I is a "fep". However, it should be noted that out items are excluded both from the extension of "fep" and the extension of "blicket", suggesting that when responding to "blicket" in the representation test, participants are still influenced by their responses for "blicket" during the extension test (where they also excluded out items from the extension of the word). So this would suggest that, while participants may not recruit their initial representation of "blicket" to extend "fep" to test items in the representation test, they still recruit it (together with what they have learned about "fep") to find the extension of "blicket" in the representation test. While we cannot entirely dismiss the possibility that participants would selectively attend to their initial representation of "blicket" in the representation test when responding to "blicket" but not to "fep", our data provide little support for this hypothesis.
We conclude thus that the most likely interpretation of our results in Experiment 1 is that adults recruited the representation of the initial word during the representation test. Adults preferred, in the first phase, to postulate that the novel word carries homophony when it is learned from exemplars in a bimodal distribution. Given the similarity of the results between Experiments 1 and 2, one would be tempted to extend this interpretation to the children results from Experiment 2. At this point, however, a note of caution is necessary. One limitation of the present data is that we used our conclusion with adults to rule out a possible confound for Experiments 1 and 2 with adults and children. Yet we cannot exclude the possibility that this very explanation may still underlie children's response pattern. We leave it for future research to establish a more direct argument to explain both children's and adults' performance in these experiments.

General discussion
Children are sensitive to the sampling distribution of the learning exemplars when learning words (as in Xu & Tenenbaum, 2007). Yet, we demonstrate that this interacts with the kind of formmeaning representation children are ready to entertain. Observing a bimodal distribution of learning exemplars for a novel word indicated to our participants that the word was likely to have several meanings. Importantly, our results suggest that these meanings were stored separately, suggesting that children's representation of the novel word in these conditions is very much similar to homophony (Srinivasan & Snedeker, 2011). This extends previous results from adults (Dautriche & Chemla, submitted) and suggests that when observing a bimodal distribution of exemplars for the same word form, children generate form-meaning representations, such as homophony, that respect concept convexity.
Current word learning accounts have documented and modeled paradigmatic cases of word acquisition, where a single form is associated with a single meaning. We pursued that enterprise by showing that less standard situations, such as homophony, can help highlight the key role of factors such as the sampling distribution of exemplars. It also helps better understand the priors that may constrain and guide word acquisition, as we detail below.

Missing factors in word learning accounts
Previous studies have shown that children are sensitive to sampling principles when learning words. Xu and Tenenbaum (2007) describe a size principle: Children's confidence in the boundary of the set of entities associated with a word increases as they observe more learning exemplars, even if they are all identical. Here we showed that children were sensitive to the distribution of learning exemplars in conceptual space, another statistical principle presumably following from the assumption that the exemplars of a word are sampled randomly from the underlying category (c.f Xu & Tenenbaum, 2007). Intuitively, in the case of homophones, we expect the label to occur with a set of exemplars S drawn from two distinct subcategories X 1 and X 2 . Thus, S would take the form of a bimodal distribution within the single superordinate category X encompassing the two subcategories (i.e., X is the minimal well-formed category such that X 1 (∪) X 2 ⊂ X). Yet if the word were to be associated with the whole X, we would expect the exemplars that are associated with the label (i.e., S) to be uniformly distributed within X. Our results suggest that children can use such sampling considerations to decide whether a word is associated with one category (standard case) or several categories (homophone), even when exposed to very few exemplars. 5 There are other factors, not documented here, that may interact with the expectation that concepts are convex and could help children to identify that a word has several meanings. First, evidence for homophony may come from other words in the lexicon. Adults are less likely to extend a label (e.g., "blicket") to an entity, even if this entity falls right between the learning exemplars for the label, when this entity also falls close to some entity labeled by another word (e.g., "fep") (Dautriche & Chemla, submitted). Intuitively, the interfering label provides further evidence for the presence of two distinct clusters of exemplars in conceptual space, that are separated by another concept labelled by another word ("fep" in the example). This suggests that learners have expectations not only about how words occupy the conceptual space, but also about how they share the conceptual space. Similarly, children assume that word extensions are mutually exclusive (Markman & Wachtel, 1988), and may thus possibly use the presence of other words in their lexicon together with a constraint on concept convexity to discover that a word is likely to have several meanings.
Second, some linguistic constructions may be helpful to discover homophony (or the absence of homophony). For instance, children could notice that words mapping to a single meaning commonly appear in some plural sentences where homophones never appear (e.g., "These are two bats" pointing at one baseball-bat and one animal-bat). And this is so for reasons one can understand: a single phonological form cannot be used to refer to two words at the same time, even if the two words are homophonic. 6 Adults have been shown to use such constructions to assess homophony (Dautriche & Chemla, submitted), and children may also be sensitive to such linguistic evidence.
These factors may help learners to identify words with multiple meanings. Yet, they also raise immediate challenges for current word learning accounts. For instance, we assumed until now that children understood which object is referred to by the word in context. Yet, in the real world, the label is uttered in a complex visual environment where the true referent is likely to be confounded with other possible referents present at the same time (the mapping problem, Quine, 1960). Thus, it is likely that the set of exemplars for a label contains outliers, i.e., items that are outside of the true extension of the word, because the child would have failed to narrow down the true referent of the word. As a result, the set of exemplars would certainly form a multimodal distribution (e.g., a set of banana exemplars along with a dog exemplar-that happened to eat a banana during one of the learning events). The challenge for the learner is thus to distinguish between outliers of the true meaning of the word and representative examples of a new word meaning. General principles such as the convexity of concepts and one-to-one mapping between word forms and concepts may help discard noise of this type. However, if these general principles allow exceptions, as our study of homophony reveals, they may hardly help disentangle signal (of homophony) from noise.

Missing priors in word learning accounts
Our results suggest that children expect meanings to be convex, and are willing to postulate homophony rather than breaking this constraint (postulating a disjoint meaning) or than enforcing that convexity constraint at all cost (postulating a broad lexical entry for problematic words).
This contradicts current word learning accounts which, technically, transpose the notion of convexity from the level of concepts to the level of word forms, assuming that word forms link to concepts in a one-to-one fashion. Accordingly, none of the current accounts allow for the possibility that children can associate word forms with multiple meanings. As a matter of fact, many developmental studies have documented that preschoolers have notable difficulties in learning homophones (Casenhiser, 2005;Doherty, 2004;Mazzocco, 1997). In these studies, the encounter of the second meaning of a homophone is simulated by using familiar words (e.g., "snake") to refer to novel referents (e.g., an unfamiliar object). Yet, we suggest that children's failure in these studies does not reflect an excessive reliance on a one-to-one mapping between form and meaning, but rather insufficient executive skills for such a task; as the current results show, children have no problem learning homophones when they have to learn the two meanings simultaneously. Learning homophones in our study may be easier than learning a second meaning for a known word because children do not have to inhibit a highly active word representation for one of the meanings (Khanna & Boland, 2010; see also Choi & Trueswell, 2010;Novick, Trueswell, & Thompson-Schill, 2010). This suggests that children's difficulty in learning homophones may have been previously overrated and that endowing the learning system with a strict one-to-one form-meaning mapping constraint cannot solely account for the mechanism underlying the acquisition of homophones. Certainly it may still be possible that a one-to-one form-meaning mapping constraint is guiding word learning at earlier stages of language development, as 5-year-olds may already have learned to relax this constraint to accommodate more challenging form-meaning mappings, such as homophony. Yet, if this is the case, current word learning accounts should be able to explain how children depart from this default assumption.
The present work thus has important implications for current word learning accounts. When a label seems to apply to a disjoint set of objects, the learner has two options: (1) postulate homophony (i.e., the possibility that a word form maps onto two distinct meanings) or (2) follow a one-to-one form-meaning mapping constraint and postulate that the label is a single word that applies to a larger set of objects (the category that covers all the positive instances of the label). Most accounts predict that (2) is the default, but this has to be refined since children eventually learn homophones (e.g., it is likely that English preschoolers know both meanings of the word form "bat"; see also De Carvalho, Dautriche, & Christophe, 2015 for evidence that French 3-year-olds have acquired a certain number of homophone pairs). An important open issue then is to equip the learning system with the right built-in constraints. At this point it seems that word learning is guided by (a) learners' expectations that concepts are convex, always; (b) learners' expectation that word forms are linked to one meaning, in general; and (c) the possibility that a word form maps onto several distinct meanings if (a) is challenged. While (a) helps to constrain the possible concepts one can entertain, (b) helps to constrain the number of hypotheses children need to consider when learning words (especially while used in combination with (a)). Note that this would constitute very specific priors for a general learning system: it presupposes that children already know that an essential feature of their to-be-learnt lexicon is to be composed of form-meaning mappings of very specific kinds. The present study contributes to point (c) above: children can entertain the possibility that a word maps onto several distinct meanings to accommodate apparent violations of (a) concept convexity at the detriment of (b) a one-to-one mapping between forms and meanings. This suggests that children are able to selectively trigger or silence (b) as a function of the learning situation (and we documented that this could be made possible by observing the sampling distribution of the learning exemplars). One possibility for current word learning accounts would be thus to implement these three built-in constraints directly in the learning system and tweak the learning inference component to accommodate the sampling effect we document. Yet this would be a rather strong assumption that amounts to saying that the learning system expects the existence of a phenomenon as specific as homophony, from the start.
Because current word learning accounts specialized into fairly simple word leaning phenomena (i.e., one form associated with a single meaning), they equipped the learner with specialized built-in constraints (e.g., a one-to-one form-meaning mapping constraint) that cannot explain learning of other more complex phenomena, such as homophony. Incorporating homophony, and potentially other less trivial word learning situations, will allow these accounts to delineate the more general priors that children bring into the word learning task. Yet, although adding different built-in constraints to these accounts may be technically easy, this may be theoretically challenging as we described above, as one would need to explain how the learning system is capable of juggling between different priors to appropriately learn different types of words.

Conclusion
Homophony presents a challenge to word learners: it requires children to discover that word forms and concepts are not always in a one-to-one relation, an otherwise important assumption to restrict the search space for word meanings. The present study showed that children use information about the sampling distribution of learning exemplars (and in particular the fact that they form two distinct convex clusters in conceptual space) to infer homophony for a novel label. We argue that this unexplored sensitivity and the very possibility of homophony should be incorporated into future accounts of word learning.