Underpinning the On-Line Processing of (Non-)Canonical Sentences in German-Speaking Four-Year-Olds: The Interplay of Cognitive Control and Memory Capacity

ABSTRACT This study investigates the interpretation of object-initial sentences in German-speaking children. We addressed the following questions: (1) Which morphosyntactic cues do children deploy to process object-initial sentences? (2) Which executive function (EF) abilities support them during this task? This study examined the effect of case and number agreement morphology in 4;6-year-old German-speaking children (N = 27) on their interpretation of unambiguous S(ubject)-V(erb)-O(bject) and OVS sentences by combining an offline (sentence-picture matching) and an online (looking-while-listening) paradigm. Participants’ working memory and cognitive control abilities were assessed by means of a (forward) digit-span test and a flanker task. Case-marked OVS sentences were processed more accurately than number-marked ones, although both conditions were less accurate than SVO sentences. We found a comprehension facilitation driven by higher cognitive control skills that enhances, specifically the interpretation of the more demanding number cue in OVS structures already in 4;6-year-olds. Higher working memory skills are generally associated with children’s processing skills as they support the correct parsing of all online conditions and of both SVO and OVS in the offline case condition. We conclude that while case-marking appears to be processed more reliably than number by preschoolers, also number information alone can be processed, especially by children with higher cognitive control skills.


Introduction
In the course of language acquisition, comprehension of transitive sentences represents a crucial developmental step.This ability is central for correct sentence interpretation as it provides an answer to the essential and omnipresent question "Who is doing what to whom?".It is predominantly linked to the child's ability to distinguish between participants of an action and to establish a relation among them.For instance, in the transitive clause "Jane kissed John", it is fundamental to know who acts and kisses in the scene and who receives the action and is being kissed.Accordingly, correct syntactic functions of the sentential constituents (the subject or direct object) have to be recognized and processed as well as appropriate thematic roles to be assigned to the different arguments (the agent, to the one who is performing the action, and the patient, to the recipient of the action).
In English, as in the example above, word order provides the main cue for sentence interpretation.The first noun phrase (NP1) "Jane" assigns the sentence subject and the action agent, whereas the second noun phrase (NP2) "John" describes the direct object and the recipient of the action.In contrast, many other languages show a more flexible word order and a richer inflectional morphology, such as German, requiring attention to different morphosyntactic cues (e.g. for German: case marking and/or number agreement) to establish the thematic relation between sentential arguments.As a sentence unfolds, this morphosyntactic information has to be stored in working memory and/or strong expectations of a sentence-initial subject as part of a default interpretation strategy (e.g., subject-first strategies) have to be discarded, when needed.Hence, working memory and executive functions (EF) more generally are expected to play a decisive role in the correct interpretation of transitive constructions.
In German, as in English, subject-initial (SVO) word order is regarded as canonical.Nevertheless, German also allows a topicalization of the direct object, which results in a non-canonical object-initial (OVS) structure.Topicalized structures (2) are derived from the canonical ones (1) by involving syntactic movement of the object to the frontal position (Fanselow & Felix, 1990;Haegeman, 1991).Consider examples in (1) for SVO and (2) for OVS sentences.
(1) Der Fuchs fängt die Hunde.the.Nom.SG fox.SG catch.SG the.PL dog.PL "The fox is catching the dogs." (2) Die Hunde fängt der Fuchs.the.PL dog.PL catch.SG the.Nom.SG fox.SG "The fox is catching the dogs." When presented in isolation and with a neutral prosody, both sentences, SVO (1) and OVS clause (2), convey the same idea that there is a fox that is catching the dogs.Since the animacy cue is controlled for in the present examples, the speaker has to deduce who is doing what to whom from two morphosyntactic cues, namely either the case marking cue (case cue, e.g.nominative of "the fox") or the subject-verb agreement cue (e.g."the fox" and "catch" both marked in singular).For the purpose of the present paper, we will refer only to the number feature (number cue) when speaking about the subject-verb agreement in clauses as this was the only manipulated feature in our study of those that enter into the subject-verb agreement relation in German.
In the current paper, we will focus on the effect of case and number cues on the comprehension of who-is-doing-what-to-whom in German transitive sentences.One key goal is to examine whether case and number agreement facilitate sentence comprehension and processing in preschool children (age 4;6) and which of these cues are deployed more reliably to comprehend object-initial structures, in particular.While the case is marked locally and early on the determiner and also on the noun of the first NP in our German stimuli, number agreement is distributed over the subject NP and the verbal phrase (VP) and it requires processing of the subject-verb agreement relation for correct interpretation of the sentence.Contrasting case and number markings by means of an offline sentence-picture matching task and an online eye-tracking paradigm will enable us to gain a better understanding of the following crucial points.Which of these two morphosyntactic cues do four-year-olds deploy to process object-initial sentences?Does the local and early vs. distributed nature of the morphosyntactic cue affect comprehension of object-initial structures differently?
A second key goal of the present study is to assess which EF abilities support children during their task of processing and interpreting OVS sentences.We focused on two general cognitive abilities for sentence processing, namely cognitive control and working memory.Are children's comprehension and processing abilities modulated by their cognitive control skills and working memory capacity?For adult-like comprehension of OVS structures, the canonical interpretation, that is, the subject-first expectation, has to be inhibited.Regarding case as well as number marking, both pieces of information have to be stored in working memory in order to be retrieved, when processing a syntactic dependency.Still, both EFs might contribute and interact differently with regard to the comprehension of morphological markings that indicate word order variation.For instance, when solely being presented with the number cue in OVS sentences, children with more advanced cognitive control skills might reveal an advantage as compared to children will weaker cognitive control skills.In these instances, the disambiguating number information is provided rather late (in our stimuli after the verb) during the course of incremental processing, meaning that an alternative structure might have already been established that is in conflict with the incoming number information.This misinterpretation then needs to be resolved through reanalysis of the sentence.
In the next section, we will review some of the theoretical approaches that have been proposed to explain the processing of object-first sentences during development, with a focus on German.

The processing of object-first structures in children
The competition model suggests that the combination of the cue availability and cue reliability, called cue validity, predicts the acquisitional order in which morphosyntactic cues will be reliably deployed in the course of language acquisition.Thus, the cues that have the highest cue validity are recognized and acquired first (E.Bates & MacWhinney, 1987, 1989).In German, the canonical SVO word order is highly available as input for children.However, the word order cue is not reliable in German, since word order variation as in (2) above occurs.At the same time, the case cue, though less available, is highly reliable.If present, the overt case morphology explicitly and unambiguously indicates the agent-patient relation (E. Bates & MacWhinney, 1987, 1989;Kempe & MacWhinney, 1998;Lindner, 2003).Dittmar (2009) conducted a thorough corpus analysis (7032 utterances, previously coded by Stoll et al. (2009)) of spontaneous communication by German-speaking mothers with their children to investigate their input in all transitive constructions uttered.Overall, OVS clauses appeared in 19% (considering highly causative sentences) and in 29% (considering non-causative sentences) of all analyzed constructions, respectively (Compare also Lieven (2010), who stated the occurrence of 22% OVS sentences in child directed speech.).The results of the corpus analysis showed that case marking has a much higher cue validity (87% in causative sentences/85% in non-causative sentences) in German child-directed speech than the agreement cue (65%/63%) because number agreement is more rarely available in the input (agreement cue availability: 66%/64%; case cue availability: 87%/ 85%).Based on the cue validity in German, it is assumed that children profit from the case cue in their development before they process the number cue correctly (Dittmar, 2009;Kempe & MacWhinney, 1998;Lindner, 2003).
Complementary to the competition model, the local cue hypothesis states that linguistic cues can be divided into local and distributed cues (Slobin, 1982).Slobin (1982) suggests that local cues, such as case marking, presuppose a lower memory and cognitive load than distributed cues, such as number.Hence, local cues should be easier to process and be acquired earlier in the course of language development.
Along the same lines, the incremental processing hypothesis emphasizes that early arriving cues might lead children to misinterpretations of sentences that could be correctly resolved by later arriving cues (Trueswell & Gleitman, 2004;Trueswell et al., 1999).This reanalysis and -integration of late occurring cues seems to develop only during or after the first years of school in children and was shown for different languages and sentence types (Choi & Trueswell, 2010;Huang et al., 2013;Omaki et al., 2013;Weighall, 2008).These studies demonstrated that children prefer early occurring cues in utterances (as case in our study at sentence initial position), process sentences incrementally, and struggle to reanalyze them based on late arriving cues (as number in our study at verb position).Adults, like children, show a strong expectation to encounter subjects in first position (Friederici et al., 1998;Rösler et al., 1998).In addition to the ability to reanalyze a previously misparsed sentence, children also need to acquire the ability to readily and easily override the strong subject-first expectation that is needed in the case of becoming adult-like when processing an unambiguously case-marked object in first position.
Regarding the acquisitional order of the different cues from the perspective of these models, young children seem to proceed in general from local to more distributed cues (Lindner, 2003).Concerning the case cue, behavioral findings are not consistent.Strotseva-Feinschmidt et al. (2019) found in a sentence-picture matching task that children as young as at the end of age three are able to make use of this morphosyntactic cue when in coalition with the animacy cue.Other behavioral results still show that young German-speaking children start applying the case cue in isolation much later when processing object-initial sentences, pointing at five, six and seven years (Brandt et al., 2016;Dittmar et al., 2008;Knoll et al., 2012;Lindner, 2003;Schaner-Wolles, 1989;Schipke et al., 2012;Watermeyer & Kauschke, 2013).
Behavioral data concerning the processing of the number cue in German-speaking children, especially in object-initial sentences, are much sparser than for case.In a study by Stegenwallner-Schütz and Adani (2017), results indicated that children comprehend OVS sentences more accurately when the subject and direct object mismatch in number, already at the age of 5;0.Similar results have been shown for the processing of object relative clauses in German (Adani et al., 2017).In both experiments, the authors have tested case-marked sentences in which the effect of number marking was measured by comparing conditions in which the subject and object nouns mismatched in number (e.g. one singular, the other plural) to those in which they matched (e.g. both singular).Further research is necessary in order to disentangle when young children start processing number information effectively as a disambiguating cue, leading to the question of whether this facilitation persists also when number is manipulated alone, rather than together with case.
Contrary to the three processing theories introduced so far, proponents of the featural Relativized Minimality approach (Friedmann et al., 2017;Rizzi, 2013) propose that overt case marking does not facilitate the development of processing and interpreting thematic roles in sentences, such as (2), since case is a local property of the noun phrase and it is assigned to it by the verb in the example (2) (Fanselow & Felix, 1993;Haegeman, 1991).Friedmann et al. (2017) define movement triggering features as those that require to be checked during the computation of subject-verb agreement, for instance number and person in languages such as German, Italian or English, but also gender in languages such as Hebrew.Thus, only the features that are relevant for the computation of subject-verb agreement -the movement triggering features -are predicted to play a role during the processing of non-canonical sentences.
Empirical evidence supporting this proposal was presented in a behavioral study in Hebrew: Various groups of speakers with diverse language profiles (such as agrammatism, syntactic Specific Language Impairment, and children with hearing impairment) failed to correctly interpret case-marked OVS structures (Friedmann et al., 2017).In a similar line, Biran and Ruigendijk (2015) also showed that, while subject-verb gender agreement facilitated the comprehension of OVS sentences in Hebrew, case marking did not.Notably, gender is one of the features that triggers movement in Hebrew, similarly to number in languages such as English, Italian, and German.Interestingly, German-speaking children did not benefit from a gender mismatch between the two NPs in resolving subject-verb agreement in the study by Biran and Ruigendijk (2015), in line with the predictions of the featural Relativized Minimality account, given that gender is not a movement triggering feature in this language and it is not marked in the subject-verb agreement in German.For instance, Italian children and adults benefit from number agreement when confronted with object-relative clauses, but they do not benefit from gender (Adani et al., 2010;Belletti et al., 2012;Biondo et al., 2022).Concerning the present study, the proponents of the featural Relativized Minimality account predict subject-verb number agreement to enhance the adult-like comprehension of OVS sentences in German, but such an effect is not expected for case marking.
At this point, it is worth noting that the reviewed studies did not take advantage of the properties of weak masculine nouns in German.None of the experimental designs took advantage of systematically manipulating strong masculine nouns that do not receive any overt inflection on the stem in the accusative case.Instead, weak masculine nouns are both case-marked at the determiner and display an inflection -(e)n for all oblique cases in singular (Eisenberg, 1998).Consider the example in (3).
(3) Den Jungen küsst der Vater the.Acc.SG boy.Acc.SG kiss.SG the.Nom.SG father.Nom.SG "The father is kissing the boy." The explicit double case-marking of NP1 as accusative (at the determiner (den) and at the noun (with the suffix -n, as opposed to the nominative case: der Junge) can be regarded as an extra salient local cue for comprehension of the who-is-doing-what-to-whom question (E. Bates & MacWhinney, 1989).Hence, we argue here that German-speaking children might benefit from weak nouns in their input and might use those as a bootstrap into interpreting the case-marking in OVS structures correctly.For the first time to our knowledge, we presented therefore preschoolers with a paradigm manipulating weak nouns in German SVO/OVS structures that might facilitate the processing and interpretation for them.
Importantly, we intend to shed light onto the mechanisms for offline interpretation as well as onto the online mechanisms underlying the processing of object-first structures.Currently, only a few insights from eye-tracking studies with children exist, which have looked at the parsing of morphosyntactic cues in OVS sentences (Brandt-Kobele & Höhle, 2010, 2014).Our study is the first one to record eye-movements in German-speaking children while processing OVS sentences disambiguated either by the number or by the case cue within the same experiment.

The role of executive functions in the development of syntactic processing
So far, we have reviewed the literature showing that morphological features modulate the effective processing of complex sentences differently in different populations.However, executive functions, especially working memory and cognitive control, might also play an important role in syntactic processing.Cognitive control comprises inhibitory as well as cognitive flexibility skills for our purposes because, as we will explain later on, it is not possible to disentangle these two components for developing syntactic processing in studies like ours.
While the notion of executive functions summarizes a large set of mental processes needed to solve complex tasks, it divides generally into three main subparts: working memory, inhibition/interference control, and cognitive flexibility (Diamond, 2013;Miyake et al., 2000).Working memory is the capacity to keep verbal or non-verbal information in mind and being able to operate with it, whereas short-term memory denotes solely the capacity to retain information (Diamond, 2013;Kidd, 2013).Second, inhibition comprises the abilities to control a prepotent reaction on any internal or external mental or behavioral level and to suppress or overrule immediate reactions and interpretations (Diamond, 2013;Mazuka et al., 2009;Simpson et al., 2012).Third, cognitive flexibility might ground on the previous two executive functions.Cognitive flexibility defines the adeptness to adjust flexibly to changing situations and to 'switch mental sets in response to changing relevant cues in the environment' (Chevalier & Blaye, 2008).This may require to inhibit previous interpretations and to initiate or update other perspectives from working memory (Chevalier & Blaye, 2008;Diamond et al., 2005).Possibly, because cognitive flexibility builds on these other two executive functions, it develops relatively late in childhood or even until early adolescence (Davidson et al., 2006).
Regarding the specific contribution of working memory to complex sentence processing, some researchers argue for a more general effect of executive functions on complex tasks and frame working memory as a measure of executive attention, closely interacting with inhibition and essentially as not only responsible for memory storage but also for conflict resolution (Engle & Kane, 2004).Following the same reasoning, Novick and colleagues (Novick et al., 2014) trained adults on domain-general executive functions who were assessed in reading garden-path sentences pre-and post-training.It was shown that improvement in the n-back task predicted increased resolution of garden-path sentences.While this notion supposes a domain-general function of working memory, it can also be linked to domain-specific processes (Engle & Kane, 2004).More specifically, there is first evidence that successful complex sentence comprehension in children and higher working memory skills can be linked together (Kidd, 2013;Kidd et al., 2018).Regarding filler-gap dependencies, English-speaking children between the ages of five and seven years were able to parse these faster and more efficiently when tested for higher verbal working memory capacity than those with lower verbal working memory (Roberts et al., 2007).Furthermore, similar findings were shown for English-speaking 6-to 7-year-olds' attachment preferences of prepositional phrases in relative clauses (Felser et al., 2003).More closely related to the OVS structures tested in the present study, Arosio et al. (2011) were able to associate higher verbal short-term memory capacity as measured by a digit span task with enhanced offline comprehension of object relative clauses disambiguated by animacy using a self-paced listening task.Interestingly, in the same study neither could online measures nor the same structures disambiguated by number be correlated with short-term memory measures.
Furthermore, inhibitory control abilities are commonly associated with the necessary suppression of irrelevant or misleading cues for successful syntactic interpretation, especially in children (Mazuka et al., 2009).As already summarized, according to the incremental processing hypothesis, children tend to rely on early arriving cues.They do not seem to be able to suppress this initial syntactic interpretation once set on the basis of these cues, even if later syntactic cues should unambiguously direct them to a different interpretation as the sentence unfolds.Choi and Trueswell (2010) argue for a general preference in children for early arriving cues in syntactic parsing based on a Korean study on the processing of garden-path sentences in 4-to 5-year-olds.Huang et al. (2013) provided further evidence for the existence of an incremental processing strategy using an eye-tracking study on the interpretation of Mandarin passive sentences in 5-year-olds.A similar processing mechanism with the addition of the inability to inhibit default interpretations was shown for Japanese children (mean age: 5;9) parsing filler-gap dependencies (Omaki et al., 2013).Children's reliance on default interpretational strategies has been discussed for several decades now.Direct evidence for the interplay between morphosyntactic abilities and inhibition skills were shown by Gandolfi and Viterbori (2020) for children between 2;0-2;8 measuring several different inhibition tasks.Higher interference suppression skills were not only significantly related to better results in language production but also longitudinally to more advanced receptive morphosyntactic performance one year later.
Both inhibition and cognitive flexibility are necessary components in complex sentence processing as it requests not only to use a syntactic cue to suppress a (mis)interpretation of a sentence but also to update it and to switch flexibly to a new analysis.The seminal study by Trueswell et al. (1999) introduced the notion of a Kindergarten path effect describing the child's inability to recover from misinterpretation and to reanalyze the sentence.Most likely, garden-path, non-canonical, or other complex sentences are also difficult to interpret for children because of their still developing cognitive flexibility to revise their previous interpretation (Mazuka et al., 2009).Concerning the resolution of temporarily ambiguous prepositional phrases, cognitive flexibility as recorded by a go/no-go task within a flanker task has been shown to predict the individual performance of children (mean age: 4;10) in conflict resolution (Woodard et al., 2016).Interestingly, working memory skills did not correlate in their results with the ability to revise initial interpretations.As already mentioned, inhibitory control and cognitive flexibility might be difficult to disentangle when it comes to complex sentence resolution.Both may be subsumed under the notion of cognitive control.Höhle et al. (2016) focused on working memory (measured by forward digit span) and cognitive control/inhibition skills (measured by a child-adapted flanker task) in 4-year-olds.They found a general effect of working memory in children's processing of focus particles in pre-object and pre-subject position and a more specific benefit of inhibitory control abilities in the interpretation of the more demanding pre-subject position.
In our study, we focus on the contribution of cognitive control skills in children to OVS interpretation while being aware that inhibitory control is only one part of the interplay between inhibition and cognitive flexibility skills necessary for resolving ambiguities.Here, we are interested to examine whether children's cognitive control skills might impact their processing of the number cue (5a/b) in OVS sentences to a greater extent than the correct interpretation of the case cue (4a/b).For the latter, the unambiguous case cue arrives in our stimuli as early as in the first word, namely at the determiner, of the sentences presented in this condition.Whereas for the number cue, specifically, the disambiguating cue for correct thematic role assignment is only available later at the verb and thus the correct interpretation requires the suppression of the subject-first interpretation and consequently to reanalyze the sentence.In both conditions, however, the object-first construction poses a higher computational load.From early on in sentence processing research, it could be shown that even German adults expect a prototypical, i.e. subject-first, structure and display a subject-initial preference when confronted with transitive sentences (Beim Graben et al., 2000;Friederici & Mecklinger, 1996;Mecklinger et al., 1995), while they have to discard misanalyses in ambiguous structures (Bornkessel et al., 2004).Very similarly, a subject-first strategy is assumed for children processing these structures (Dittmar et al., 2008;Lindner, 2003;Schipke et al., 2012).This means that in order to arrive at the correct interpretation for OVS sentences, they need to learn to retreat from this strategy and prioritize grammatical information, such as case or number information that stands in conflict with the subject-first expectation.Therefore, we put forward the hypothesis that children with more advanced cognitive control are better able to suppress the initial subject-first expectation and solve the comprehension task at hand.
Little research has tested so far either the interplay of working memory skills and syntactic processing e.g.(Arosio et al., 2011;Felser et al., 2003;Roberts et al., 2007) or the interplay of cognitive control abilities and syntactic processing e.g.(Gandolfi & Viterbori, 2020), in children systematically.Even fewer studies examined the relative impact of these two executive functions on parsing abilities in one experiment e.g.(Höhle et al., 2016;Woodard et al., 2016).The present study aims at enlarging the existing base of information on complex sentence processing in order to disentangle the cognitive underpinnings during language development.

The current study
For the first time, the current study will present children with the case and number cue in opposition to each other (i.e., our manipulation separates these two cues into different conditions (4a/b vs. 5a/b)) within one experiment for the resolution of German SVO/OVS sentences.Which of these cues precede in the course of language acquisition?Do case and number agreement facilitate sentence processing and comprehension and which of these cues is deployed more reliably?
Empirical evidence suggests that the age range between four and five years to be crucial for the acquisitional step of the case and number cues (Brandt-Kobele & Höhle, 2010, 2014;Schipke et al., 2012;Stegenwallner-Schütz & Adani, 2017).For this reason, in the present study, we will focus on 4;6-year-old German preschoolers.We aim to reveal how these different morphosyntactic cues influence sentence processing not only offline in a sentence-picture matching task but also online, involving an implicit looking-while-listening eye-tracking paradigm.One novelty of the current experiment with respect to the existing studies is that the case marking cue will be designed to be more salient with the help of weak masculine nouns.We argue that German-speaking children might benefit from weak nouns, which will facilitate the processing and interpretation of object-initial structures for them.
Based on the competition model, the local cue hypothesis, and the incremental processing hypothesis, 4-year-olds should deploy case more reliably than number agreement (E. Bates & MacWhinney, 1989;Lindner, 2003;Slobin, 1982).However, following the featural Relativized Minimality approach, children will be able to use number but not case as a trigger for movement-derived sentences and will thus benefit from subject-verb number agreement instead of case marking (Friedmann et al., 2017;Rizzi, 1990;Stegenwallner-Schütz & Adani, 2017).In the current study, we will test these opposing hypotheses against each other and will contribute to further insights into the discussion of these existing theories.The online processing data depicted by our eye-tracking experiment might additionally reveal specific differences in the looks to the target after NP1 offset between case and number, since the sentences can be resolved based on the case cue right after NP1 but only after the verb based on the number cue.
Furthermore, we aim to investigate the impact of working memory capacity and cognitive control abilities on the parsing of object-initial structures in young German-speaking children.Are 4-yearolds' comprehension and processing abilities modulated by their working memory and cognitive control skills?Both, case marking as well as number agreement, have to be stored in short-term memory when parsing the different sentences in the respective conditions.While the number cue might have an even higher working memory load since it is a distributed cue, we hypothesize that children's higher cognitive control abilities might specifically support them when resolving OVS sentences disambiguated by number agreement due to the later arriving cue in this condition which demands suppressing a subject-first misanalysis and subsequently a reanalysis.As Höhle et al. (2016), we aim to disentangle the contribution of these two executive functions (working memory and cognitive control) to non-canonical sentence processing during language development.The specific and perhaps distinct facilitatory effects of these two cognitive skills on complex sentence interpretation for children will expand our insights into the general cognitive mechanisms playing an important role in grammatical acquisition.

Participants
Thirty-six children were tested.Using a parental questionnaire, we ensured that all children grew up in a monolingual German-speaking environment, exhibited no known developmental deficits and were not born prematurely.After testing, nine children had to be excluded (five due to fussiness during testing and four due to lack of test completion).Thus, a sample of 27 children (mean age: 53 months; age range: 51-57 months; 13 girls; 14 boys) was considered for the analysis.This study was reviewed and approved by the ethics committee at the University of Potsdam (61/2016) and was carried out in a university laboratory upon parental written informed consent from all participants, in accordance with the Declaration of Helsinki.

Stimuli
Experimental items comprised semantically reversible declarative transitive German sentences.Two independent factors were manipulated: Word Order (levels: SVO/OVS) and Disambiguation (levels: case/number).
Five monosyllabic-and seven bi-syllabic animal nouns were used to construct the auditory stimuli.According to the scales developed by Schröder et al. (2012) all nouns had an estimated mean age of acquisition between 3 and 4 years (mean: 2,12 points; SD: 0,67; range 1.7-3.0),were evaluated on average moderately to very familiar (mean: 3.4 points; SD: 1.2; range: 2.75-3.80)and very typical members of the category animal (mean: 1.49; SD: 0.92; range: 1.05-2.25).Each noun in both conditions occurred equally often (16 times), was counter-balanced with respect to the NP1 and NP2 position and equally often interacted with other event participants to neutralize possible order effects and co-occurrences.In terms of semantics, all nouns used in the experiment were ideal candidates for an agent role (Primus, 1999), as they were all animate and performed a deliberate physical action.
In the case condition, the nouns, which were used as sentence arguments, were always in singular as in (4a & b).Six animate masculine nouns of the weak declination type were used: "der Bär" ("bear"), "der Rabe" ("crow"), "der Drache" ("dragon"), "der Hase" ("rabbit"), "der Affe" ("monkey"), "der Löwe" ("lion").For nouns belonging to the weak declination, accusative case is marked not only on the preceding determiner "den" but also at the offset of the noun, by adding an "-(e)n" to the lemma.Only the noun "der Drache" did not appear in Schröder et al. (2012) but it has been routinely used in previous experiments conducted within our lab, and it is not an outlier with respect to the other nouns.
Three transitive verbs were used as predicates of the test sentences: "waschen" ("to wash"), "stechen" ("to stab") and "fangen" ("to catch") and were counter-balanced across pairs of event participants and item lists.All verbs exhibit a vowel change in the third person singular (e.g., wäscht (singular); waschen (plural)) and were selected in accordance with an early age of acquisition and a good depictability (Kauschke & Siegmüller, 2016).
We created four experimental lists.Each experimental list consisted of 48 items, 24 of which were disambiguated by the number cue (12 in SVO and 12 in OVS word order), and 24 of which were disambiguated by the case cue (12 in SVO and 12 in OVS word order).Items were presented in a pseudo-randomized order (see Appendix A, Supplemental Materials).
A female native speaker of German, who was trained to speak in a child-directed manner, recorded the auditory stimuli in an anechoic room.The prosodic contour of the sentences was kept as neutral as possible.Sentences were normalized in amplitude to 70%.The duration of the test sentences varied from 2916 ms to 3499 ms (mean: 3188; SD: 110).After recording, the onset of the verb, and the NP2 onset were manually adjusted by adding silence in order to keep their occurrence constant across all sentences without violating their natural sound.The onset of Verb was at 1319.75 ms and the onset of NP2 at 2269.48 ms across all sentences and conditions.The results of the acoustic analysis of duration and pitch is reported in Appendix B, Supplemental Materials.
Pictures with a white background for each item were created with a frame resolution of 700 × 350px using Adobe Photoshop CS6.Target and non-target pictures (showing the exact opposite agentpatient relation) were positioned in a gray 1680 × 1050px window that corresponded to the eye-tracker screen resolution (see Figure 1).The appearance of the target on the right/left side was counterbalanced across conditions.The size and visual saliency of the animals were kept homogenous, by constructing picture pairs in which the two animals were approximately of the same size and of the same color between the two pictures.Visual and auditory stimuli were combined for the presentation on the eye-tracking software using Adobe Flash Professional CS6.

Procedure
A test session consisted of a sentence-picture matching task embedded in an eye-tracking-while listening experiment, a child-adapted flanker task (Höhle et al., 2016), a child-adapted forward digit span test (Höhle et al., 2016) based on Kiese-Himmel (2007), and three standardized subtests of the TSVK (test for sentence comprehension in children) (Siegmüller et al., 2011), to ensure that participants perform within the norm on receptive grammar.Subsets 3 and 4 examined children's understanding of passive sentences and binding, whereas subset 6 evaluated how children comprehend relative clauses.

Online and offline experiment
The looking time and gaze pattern were recorded using an eye-tracking system SMI RED 250, iViewX Version 2.8.43, sample rate: 60 Hz, run by SMI Experiment Centre software (Version 3.5.169).During the entire eye-tracking session, participants sat in a car-safety seat approximately 55-70 cm from the SMI eye-tracking monitor.An experimenter was placed hidden from the participant and controlled the calibration and tracking quality via a DELL laptop that was connected to the eye-tracker.
The procedure consisted of a familiarization and a testing phase.In order to maintain children's attention and interest in the eye-tracking experiment, a storyline about visiting a farm was integrated and introduced by the experimenter.Afterwards, the familiarization phase started that presented the verbs, nouns, and the different word orders employed in the experiment not using any sentence of the testing phase.
A block design was employed for the testing phase.An experimental list started either with 24 items of the number condition or with 24 items of the case condition (see Appendix A, Supplemental Materials, for experimental lists).Items were pseudo-randomized with respect to the word order and screen side of the target picture.The possibility of taking a break was programmed after the 16 th trial.
As soon as the child's gaze was directed to the middle of the display by an attention grabber (moving circle), the experimenter switched to the test item.Each test sentence started with a 3000 ms preview of the two pictures side-by-side.Subsequently, children listened to the test sentence that described the event depicted in one of the two pictures on the screen.After sentence offset, participants were instructed to press one of the two buttons positioned in front of them.The side of button (left/right) corresponded to the side of the picture on the screen.Children were instructed to choose the picture by button press that was matching the sentence they had listened to before.The performance in the sentence-picture matching task displayed the behavioral accuracy of the child's responses.The time limit for each experimental item was set to 21,000 ms (including preview time), after which it automatically disappeared from the screen.As soon as the child pressed a button, the experimenter switched to the attention grabber and the next test item appeared.If participants failed to press a button or pressed both buttons simultaneously, the response was recorded as incorrect.The entire eye-tracking session lasted approximately 25 minutes.

Flanker task
Next, a child-adapted version of the flanker task (Höhle et al., 2016) was also administered, using DMDX.Instead of the traditional flanker arrows (→→←→→), there were yellow drawings of fish, facing either to the left or right side on a dark blue background.Children were instructed to play a game about feeding the fish.The same external response buttons as in the eye-tracking experiment were used.After presenting a fixation cross, participants were asked to press the left button, if the target fish in the middle was heading to the left, or the right button, if it was heading to the right.Children were told to press the button as quickly as possible.A correct response was accompanied by a cheerful sound and an incorrect response by a sad one.The chosen side (left or right) as well as the reaction time of responses were recorded.
The flanker task contained three conditions: neutral condition, congruent condition, and incongruent condition.In the neutral condition, a single fish appeared on the display, whereas in the other two conditions the target fish was flanked by two other fish on each side.In the congruent condition, the target fish in the middle was heading in the same direction as the other fish.In the incongruent condition, it was heading in the opposite direction.All the fish were of the same size.Two practice blocks (six items each) ensured that all children understood the task and gave at least one correct response before the test block started.There were 16 trials per condition in the test blocks in total, presented in a pseudo-randomized order (8 blocks, 6 trials each) so that no more than two trials in a row displayed the same condition and that maximally three consecutive trials demanded the same response.We counterbalanced the facing of the target fish (left/right) for each block and for each condition.
The main goal of the experiment was to attain the inhibition effect that reflects the child's ability to resolve conflicting information by disregarding interfering information and focusing on the target.The inhibition effect is measured by the difference between responses in incongruent and congruent conditions; a smaller disparity between the incongruent and congruent conditions signifies stronger inhibition abilities.For the purpose of this paper, we were interested only in the interference effect, that is, the reaction time (RT) difference between incongruent and congruent trials (incongruent RTcongruent RT).The reaction time difference can be considered a more sensitive and reliable measure of cognitive control, since accuracy scores are affected by other individual differences such as speedaccuracy trade-offs or processing speed more generally.Moreover, the accuracy difference is mostly due to errors on incongruent trials, while the reaction times of incongruent and congruent trials covary (Hedge et al., 2018).Children who yield smaller reaction time differences will be considered to have more advanced cognitive control skills.

Digit span task
A computerized child-adapted forward digit span test was administered, Höhle et al. (2016).They based the test on Kiese-Himmel (2007) but excluded the digits "10" and "7" to ensure one-digit-length and one-syllable-length for all items.Each single digit was recorded by a male German native speaker with neutral intonation.Sequences of these prerecorded digits were played in DMDX, with a rate of one digit per second.No sequence contained the same digit twice.
The children were instructed to play a "parrot game," to pretend to be a parrot and to repeat sequences of pseudo-randomized digits in the same order as presented.Three practice trials with two digits each ensured that all children knew the task and repeated at least two trials correctly before the test block began.It started with two successive digits.Each sequence length was maximally tested with three trials; no repetitions were given in the test trials (N = 19 in total).If a child reiterated a given length already twice correctly, the test was switched to the next sequence that increased by one digit (up to a series of eight successive digits).After two or more errors for one given length, the experimenter stopped the test and participants received applause by a parrot.
As the dependent variable for our analysis, we used the number of correct trials in total for each child (Min: 4; Max: 10; Mean: 6.2; SD: 1.28).

Offline results
The proportion of correct answers in the sentence-picture matching task was calculated for each participant and was used as the dependent variable for statistical analysis.
The accuracy values were analyzed by fitting a generalized linear mixed-effects model implemented in R (version 4.0.2,R Core Team, 2020) using the package lme4 (version 1.1-23; D. Bates et al. (2015)).The accurate data with correct response as a binary dependent variable were fit into a logit mixed model (Jaeger, 2008), using the function glmer (generalized linear mixed model).
The initial model contained the following predictors.The two-level factor Word Order (SVO, OVS), and the two-level factor Disambiguation (Case, Number), the contrasts for which were set up with a sliding contrast.This means that the mean of the dependent variable (proportion of correct answers in the sentence-picture matching task) on one level was compared to the mean of the next level.The statistical model contained two continuous fixed-effect predictors, Digit Span (sum of correct trials in the forward digit span task) and Flanker_RT (score on inhibition effect in the flanker test calculated as the difference in RTs between incongruent and congruent trials, converted into seconds), which were both centered by subtracting the mean of the respective variable from each of its data points, and consequently z-standardized through division by their respective standard deviation.Before specifying both the continuous covariates in the model, we ensured that Digit Span and Flanker_RT were not interrelated and checked their correlation.This was assessed using a correlation analysis (before centering the data), which revealed no significant correlation (Spearman's rho = −0.07,p = .7).The factors Word Order and Disambiguation were allowed to interact with each other as well as with Digit Span and Flanker_RT.The raw scores of the language assessment TSVK (Min: 26; Max: 44; Mean: 35.18; SD: 4.35) correlated with Digit Span (Spearman's rho = 0.375, p = .05).We therefore decided to take the TSVK results out of the model because the interrelations of the accuracy measures for Word Order and Disambiguation with working memory are of much larger interest for the given research questions than those with a standardized language assessment.
In addition to these fixed effects, the following random components were specified: An adjustment of children's individual average accuracy (i.e., random intercepts for participants), an adjustment for the children's individual effect of Word Order and also Disambiguation (i.e., random slopes for Word Order and for Disambiguation), an adjustment for item-specific accuracy (i.e., random intercepts for each item of sentence-picture combination), and an adjustment for an item-specific effect of Word Order (i.e., random slopes for Word Order).The complete model specification is provided in Appendix C (a), Supplemental Materials.
Starting with the fully parametrized model, we followed the procedure described in D. Bates et al. (2015) and Matuschek et al. (2017) to progressively decrease the complexity of the random structure in order to obtain the one that was justified by the data.The random structure of the resulting model was shown not to be over parametrized and to have the highest goodness-of-fit (see Appendix C (b), Supplemental Materials, for the final model).
A significant effect of Word Order (for all z-values and p-values, see Table 1) was found.As can be seen in Figure 2, answers in the SVO condition were more accurate (case: 95%; number: 92%) than in the OVS condition (case: 58%; number 39%).Disambiguation also affected accuracy significantly, while Word Order and Disambiguation did not interact, however.
Flanker_RT does not significantly predict accuracy scores for the sentence-picture matching task, whereas Digit Span does.There is an interaction between Digit Span and Disambiguation.This interaction reflects that the higher a child's digit span sum score, the more accurate their comprehension of sentences marked for case (which is coded positively), but not number.The relation is shown in Figure 3, left Panel.This finding has to be interpreted independently of Word Order, however, as Digit   Span does not predict accuracy scores with regard to the Word Order manipulation (see Figure E1 in Appendix E, Supplemental Materials).Any two-way interaction of Flanker_RT with Disambiguation or Word Order is absent.
Regarding cognitive control as measured by Flanker_RT, a significant three-way interaction between Disambiguation and Word Order was found.Visual inspection of Figure E2 in Appendix E (specifically panel 2), Supplemental Materials, indicates that the lower the reaction time differences in the Flanker task are, the higher the accuracy scores are in the OVS Number condition.A post-hoc nested model (see Table 2) showed that the interaction of Flanker_RT and Disambiguation is constrained to OVS sentences.This means that there is an increasingly larger effect of disambiguation within OVS sentences among children whose reaction times in the Flanker task are more strongly affected by the incongruency.In contrast, we did not find an interaction of Digit Span with Word Order and Disambiguation in the original model.
In order to gain additional insight into the consistency of the effects that the two cues, case and number, have on OVS interpretation and their timing of acquisition, we checked post-hoc for the consistency of the effect of the case and number cues on OVS sentences at the individual performance level.We calculated the chance level corresponding to the fifth up to the 95 th quantile of a binomial distribution with the parameters of the 12 trials with a probability of p = .50 of scoring correctly.Accordingly, the chance level corresponds to 4-8 correct trials.To this end, we assigned each child to one of the three groups according to their performance in the OVS Case condition and, in turn, the OVS Number condition, following the same rationale.Children who reliably identified the target, i.e., they scored nine correct trials or more in one condition, are assigned to "Correct (thematic-role assignment)" while "At chance" refers to children who scored between 4 and 8 trials correctly.The third category "Reversed (thematic-role assignment)" includes children who reliably preferred to assign the agent role to the first NP in an OVS sentence, i.e., they scored three or less correct trials per condition.Table 3 displays the number of children falling into one of the nine possible combinations of performance patterns: for example, six children (22%) primarily assigned thematic roles correctly in the OVS Case condition and at chance in the OVS Number condition.Ten children (36%), who provided uninformative answers for the purpose of testing whether a case is used more consistently than a number are excluded.Overall, this analysis reveals that 16 participants (59%) who are consistently using case to interpret OVS sentences are not able to use number consistently with the exception of one child, who responded more consistently to number rather than cases, thus behaving contrary to the expected timing of acquisition.
In summary for the offline results, accuracy scores were significantly higher for the SVO than for the OVS word order as well as for the case condition rather than the number condition.For OVS conditions, an analysis of individual performances largely supports the conclusion that children who are able to use number information consistently are also sensitive to case information but not the other way around.Digit Span significantly predicts these accuracy scores in a way that it positively enhances disambiguation in the case condition.Contrastively, Flanker_RT predicts significantly increasing performance specifically in the OVS word order disambiguated by number.For the performance in the OVS Number conditions, the group with correct thematic-role assignment is abbreviated as "Correct" and the group with reversed thematic-role assignment is abbreviated as "Reversed"."√"indicates evidence for the acquisition pattern of OVS Case prior to OVS Number; "X"indicates counter evidence.

Online results
Two specific areas of interest (AoI) were created in the size of the two pictures displayed as a square of 700 × 350px each, one for the target and the other for the distractor.Trials with less than 50% looks in the two AoIs were excluded from all analyses.For this reason, 22 trials out of a maximum of 1488 trials had to be removed (1.48% of the data).
Visual inspection of the time course of children's target looks in the four experimental conditions reveals that their target looks diverged between SVO and OVS sentences, as well as OVS sentences marked for case and OVS sentences marked for number (see Figures 4 and 5).We first ran a permutation analysis as we were interested in quantifying the time point in the eye-tracking record at which children's target looks diverge as an effect of word order and type of disambiguation.By doing so, we wanted to avoid imposing ad-hoc artificial and arbitrary time windows for analysis on the eye-tracking record, and only use a window-based approach that includes the linguistic information of the stimuli later, when modeling the effects of the cognitive measures.The time window identified by the permutation analysis was then used to estimate how incremental linguistic information (NP1, verb, NP2) modulates the target looks, also taking into consideration the experimental factors (Word order, Disambiguation) as well as cognitive measures of memory capacity and cognitive control using linear mixed models.All statistical analyses were conducted using the R language (R version 4.1.3,R Core Team, 2022).Permutation analysis is argued to be well suited for looking-while-listening data, when the onset of diverging looking behavior, which is attributable to the experimental manipulation, is yet unknown.Our permutation analysis was largely based on the analysis reported in Chan et al. (2018), and their openly shared analysis script.We split the time period between the onset of the auditory stimuli up to two seconds after their offset into small bins of 17 ms.17 ms correspond to the time-lapse between the collection of two data points, since we tracked participants' eye-movements at a ratio of 60 Hz.
The first permutation analysis was carried out on the predicted effect of Word Order with the aim to identify the onset of the divergence of children's target looks between the SVO and OVS conditions.We carried out linear-mixed models (using the lme4 package) in order to assess the effect of Word Order within each time bin.Time bins were significant, when children exhibited higher proportions of target looks in the SVO condition relative to the OVS condition.The onset of a period of consecutive significant time bins (i.e., a cluster) was at 4233 ms and lasted until 8245 ms (see Figure 4).
Next, we tested the occurrence of the significant cluster for the effect of Word Order against a random distribution.To this end, we randomly permuted the factor labels in 1000 simulations.We then tested the observed cluster sum t-values (i.e., the sum of the t-values of clusters with significant differences between the SVO vs. OVS conditions in the observed data, see Figure E3 in Appendix E, Supplemental Materials) against the maximal sum t distribution derived from the simulations.For each cluster, p-values yielded the percentage of values in the corresponding maximal sum t-values distribution that were less than the observed sum t-values.This permutation analysis indicated that the cluster shown in Figure 4, from 4233 to 8245 ms, was statistically significant (sum t = 1594, p < .001).It shows that target looks diverge at this point, when comparing the SVO to the OVS sentences.
Consequently, we assessed the onset of the effect of Disambiguation within each word order separately in a second permutation analysis.There were significant time bins among OVS sentences, when children exhibited higher proportions of target looks in the Case relative to the Number condition.The longest cluster was observed to be between 4454 ms and 6086 ms (see Figure 5).This was followed by another much shorter cluster, which occurred between 6120 ms and 6375 ms.We tested the observed OVS cluster sum t-values (i.e., the sum of the t-values of consecutive bins with significant differences between the Case vs. Number conditions in the observed data) against the OVS maximal sum t distribution by randomly permuting the labels of the Disambiguation factor for OVS sentences, again in 1000 simulations.For consistency, we carried out this analysis for both word orders (see Figure E4 in Appendix E, Supplemental Materials).This permutation analysis revealed two significant clusters, one from 4454 to 6086 ms (t-sum = 351, p < .001)and the other from 6120 to 6375 ms (t-sum = 42, p < .001)shortly after.This result shows that children's looks shifted toward the target earlier, i.e., around 1454 ms after the stimulus onset (which followed the 3000 ms baseline) in the OVS Case than in the OVS Number condition.Since the onset of constituents was strictly aligned in all stimuli across all conditions, we infer that the onset of the Disambiguation effect in the OVS conditions occurred after children have listened to the accusative case marking and the accusative noun declination of NP1 but before they have finished listening to the verb.While still displaying imperfect thematic role assignment, this divergence indicates that the accusative case marking prevents participants from systematically assigning the agent role to NP1 in the OVS Case condition but not in the OVS Number condition.In summary, the results of the permutation analysis identified the verb region as a critical phase in which target looks begin to diverge based on word order and the disambiguation cue and that this phase persists until the off-set of NP2.Consequently, the information driving the divergence of target looks must be processed by the time of the verb onset, implying that it originates from the preceding NP1 region.
Based on the results of permutation analysis, we define the auditory regions of interest (RoIs) for the analysis of the role of executive function on SVO and OVS sentences with different disambiguation cues with respect to the sentence constituent structure (NP1, Verb, NP2).Towards this end, we add 200 ms after stimulus onset, to take into account the average time span necessary for programming and executing an eye-movement (Trueswell, 2008).The analysis window started with the baseline window (preview of 3000 ms) across the duration of the sentence presented (mean duration was 3.188 ms, see also Appendix B, Supplemental Materials) until 1300 ms silence window after sentence offset.For mean durations of sentence constituents in all conditions, see Appendix B, Supplemental Materials.We computed the proportion of looks to the target picture (PLT) by dividing the looking proportions to the target picture by the sum of looking proportions to both, the target and non-target pictures, thus PLT was only calculated for looks recorded within the pre-defined AoIs.It was used as the dependent variable for the subsequent statistical analysis and calculated for each trial in the relevant time window.
The PLT values were analyzed by fitting a linear mixed-effects model using the same package lme4 in R as mentioned above for the offline data.The initial model contained the three-level factor Window (NP1, Verb, NP2), the two-level factor Word Order (SVO, OVS), and the two-level factor Disambiguation (case, number).The contrasts for all three factors were set up with a sliding contrast.For the fixed factor Window, this resulted basically in two comparisons: 1: means of Window Verb -Window NP1 and 2: means of Window NP2 -Window Verb.The offline data contained two continuous fixed-effect predictors, Digit Span (sum of correct trials in the forward digit span task) and Flanker_RT (score on inhibition effect in the flanker test calculated as the difference in RTs between incongruent and congruent trials, expressed in milliseconds), which were both centered.The factors Window, Word Order and Disambiguation were allowed to interact with each other.The continuous factors Flanker_RT and Digit Span were both allowed to interact with Word order and Disambiguation.
In addition to these fixed effects, the model contained the following random components: An adjustment of children's individual average PLT (i.e.random intercept for participants), an adjustment for item-specific PLT (i.e.random intercept for each target-distractor combination), an adjustment for the children's individual Window, Word Order, and Disambiguation effect (i.e.random slope for Window, for Word Order, and for Disambiguation), and an adjustment for item-specific Window and Word Order effect (i.e.random slope for Window and for Word Order).The complete model specification is provided in Appendix D (a), Supplemental Materials.
The rationale for this model is that we expected time-course effects (i.e., effects of Window) that affect the way Word Order and Disambiguation are processed, within the window identified by the permutation analyses.However, we expected that the two continuous predictors (Digit Span and Flanker_RT) will have a global effect on sentence processing which will not be bound to the time windows.Starting from this model, we followed the procedure described in D. Bates et al. (2015) and Matuschek et al. (2017) to progressively decrease the complexity of the random structure in order to obtain a model that converged (all previous models were singular and/or did not converge (see Appendix D (b), Supplemental Materials, for the resulting model).
Next, we focus on the results of the window-based modeling of the eye-tracking-while-listening task.PLT differed significantly in the NP1 Window compared to the Verb Window (for all t values, see Table 4) as well as in the Verb Window compared to the NP2 Window.Independently of time windows, a significant effect of Word Order and Disambiguation was shown.As can be seen in Figure 6, the mean target looking proportions were higher in the SVO than in the OVS condition and in the Case than the Number condition, overall.PLT for Word Order and Disambiguation also interacted significantly taken together for the course of all time windows.In the first window comparison (PLT at the verb -PLT at NP1), PLT differed significantly for Word Order and for Disambiguation, showing an increase of PLT in the SVO vs. OVS condition and in the Case vs. the Number condition from NP1 to Verb.In this time window comparison (PLT at the verb -PLT at NP1), a significant interaction of Word Order and Disambiguation was also shown, which is apparent as decreased PLT in the OVS Number condition (Figure 6).In the second window comparison (PLT at  NP2 -PLT at the verb), neither did differences of PLT reach significance for Word Order nor for Disambiguation nor did these two factors interact.Post-hoc nested modeling of the interaction (see Table 5) confirmed that the effect of Disambiguation is restricted to OVS sentences, with higher proportions of looks in the OVS Case condition relative to the OVS Number condition.The positive estimate of the Window (Verb-NP1) × Disambiguation interaction for OVS sentences shows that the difference of PLT between OVS Case and OVS Number conditions increases in the Verb window, relative to the NP1 window.
Turning now to the continuous predictors, Digit Span significantly predicts PLT independently of experimental conditions, meaning that a higher digit span sum score co-occurs with larger PLT.However, Digit Span does not modulate any other variable.
In contrast to Digit Span, we did not find a main effect of Flanker_RT.However, the results show a significant effect of Flanker_RT on PLT for the factor Word Order and a significant three-way interaction between Flanker_RT, Word Order, and Disambiguation.As is visible in Figure 7, Panel 2, higher cognitive control skills, as measured by the flanker task, enhance the processing of the OVS Number condition in particular.
We would like to point out once more that the individual scores of the two continuous predictors Digit Span and Flanker_RT did not correlate significantly (see above, results of the offline data).
Overall, PLT in the eye-tracking-while listening experiment was overall significantly higher in the SVO than the OVS condition and significantly higher in the Case than the Number condition.In the time course from NP1 to Verb, results showed a significant increase of PLT in the SVO as compared to the OVS condition and in the Case compared to the Number condition.In this time window, a decrease of PLT specific to the OVS Number condition was significant as well.A higher digit span sum score predicts larger PLT in our experiment in general.Higher cognitive control skills predict higher PLT in the OVS Number condition in particular.

Discussion
The present study investigated the offline comprehension and online processing of German objectinitial main clauses in children at the age of 4;6 by means of an eye-tracking-while listening experiment, while solving a sentence-picture matching task.For the first time, the morphosyntactic cues case and number were manipulated systematically within the same experiment to elucidate processing strategies for SVO vs. OVS sentences in preschoolers.To further shed light on the underlying parsing mechanisms, we examined the impact of executive functions, namely cognitive control and working memory capacity, on the correct interpretation of these structures in children.
Let us first summarize the results of the offline accuracy and the online eye-tracking data.In the sentence-picture matching task, correct interpretation of sentences with the SVO word order occurred significantly more frequently than in the OVS word order, regardless of the morphosyntactic cues of case or number.Within the OVS condition, however, the case cue led to higher comprehension accuracy than the number cue in 4;6-year-olds.The same pattern holds for the total looking rate to the target picture in the eye-tracking results, yielding a higher proportion of looks to the target when listening to an SVO than to an OVS sentence, and more looks to the target when listening to an OVS sentence disambiguated by the case than the number cue.Target looks from first noun phrase to verb increase in SVO comprehension regardless of cue disambiguation and a larger number of correct target looks for OVS sentences disambiguated by the case cue compared to the number cue is shown.In fact, children's looks to the target decreased during the time period from the first noun phrase onward until the verb, when processing object-initial sentences disambiguated by number agreement.The correct interpretation of SVO sentences in children in the sentence-picture matching task is unsurprising and in line with the notion of a subject-first strategy among children (E. Bates & MacWhinney, 1989;Chan et al., 2009;Lindner, 2003) and adults (De Vincenzi, 1991;Friederici & Mecklinger, 1996;Frisch & Schlesewsky, 2005).The gaze pattern reveals accordingly that 4;6-year-old German-speaking children orient to the target picture from the very beginning of the subject-initial sentence and is able to assign the correct syntactic and thematic roles to the event participants.
In contrast, the correct interpretation of object-initial sentences appears to be more demanding for children, independently of the available morphosyntactic cue.Preschoolers' default interpretation of the first noun phrase as the sentence's subject and its subsequent assignment of the agent role is evident in their rapid decrease of target looks in the object-initial number cue condition, while the clause proceeds from NP1 to the verb.As a group, 4;6-year-olds orient toward the non-target picture in this condition upon listening to the first NP.This looking behavior is taken to reflect their reliance on a word-order strategy, while they incrementally process the sentence, which is misguiding them to assign the agent role to the first noun phrase.However, the gaze pattern for object-initial sentences disambiguated by the case cue indicates a contrasting processing behavior.Their eye-gaze patterns can be interpreted to reveal an emerging ability to disregard the subject-first expectation at that age.On average, looks to the target in the object-initial case condition do not decrease after sentence onset.Rather, looks to target remain steady as the sentence progresses after verb offset.Thus, the instantiation of a subject-first default strategy is no longer reflected in 4;6-year-olds's eye-gaze patterns.This looking behavior points toward a sensitivity to the case cue at the group level, which is further supported by the group analysis of accuracy data, as well as to a large extent, by the analysis of individual performance.In line with previous studies with similar age-groups (Dittmar et al., 2008;Schipke et al., 2012), we propose that 4;6-year-olds are in a transitory state which is characterized by the emerging processing of the syntactic structure of object-initial sentences based on case marking.With respect to OVS number sentences, they individually still often commit to an incrementally derived parse without reanalyzing the sentence.The differences in processing sentences disambiguated by number versus sentences disambiguated by case cue are supported by the accuracy results, also on an individual level.
Our findings corroborate the acquisitional order of cue deployment brought forward by the competition model (E.Bates & MacWhinney, 1989;Lindner, 2003;Slobin, 1982) that predicts an earlier and more reliable facilitatory effect of the case cue compared to the number cue due to its cue validity in German and due to its local structure.At the same time, children did not comprehend SVO and OVS sentences more accurately when the subject and direct object differed in number.Hence, by means of testing case and number cue in systematic opposition to each other, we provide the first evidence against the prediction that number agreement on the verb alone facilitate the processing of OVS sentences in German preschoolers, as the featural Relativized Minimality (Friedmann et al., 2017;Rizzi, 2013;Stegenwallner-Schütz & Adani, 2017, 2021) would have predicted.In contrast to Friedmann et al. (2017), we argue that preschoolers might still be sensitive to the case cue even if their interpretation of OVS sentences is not as accurate as for SVO sentences.Our data show that children do not systematically reverse thematic roles in case-marked OVS sentences and thus are sensitive to case marking.This proposal is in line with Biran and Ruigendijk (2015), who conclude that German-speaking children are sensitive to the case to a certain extent and do not use a subject-first strategy consistently in this context.They point out that the lower cue availability for case in German as feminine and neuter singular accusative NPs are ambiguous because of the syncretism of the German case marking system, and thus assume a low salience of case in German.
In order to enhance cue availability, our manipulation of the stimuli by using weak masculine nouns is precisely intended to make the case cue more salient for children, which might have enhanced the facilitatory effect of the case cue achieved in 4;6-year-olds.As weak masculine nouns increased the salience and perceivability of the case cue and therefore lowered the cue cost, the processing for young preschoolers was improved in comparison to experiments using strong nouns.Based on this first indication, we suggest that weak masculine nouns could function as a guide for German-speaking children toward thematic role resolution.This morphological anchor might be also part of the reason why the case cue is processed more accurately and earlier than the number cue in object-initial structures by preschoolers.The reason for the difference of our results and conclusion to the ones put forward by Stegenwallner-Schütz and Adani (2017) who found a higher accuracy in the number condition could be that in their stimuli, case and number co-occurred as cues and thus the coalition of cues could be the origin of the processing facilitation they found.
Another aspect to be considered in connection with the two cues tested in the current study lies in their distributed (for the number cue) versus local (for the case cue) nature (Slobin, 1982).The number cue is distributed across the verb and the subject that agrees with it.Consider that in the number condition, regardless of the word order, the child always encounters first an ambiguous NP1 and has to process minimally NP1 and the following verb in order to infer the correct relation between the arguments.In contrast, the local case cue is available earlier (at the determiner as well as at the NP marked with accusative) and instantly enables the correct thematic roles assignment e.g., the patient role to the accusative first NP, in line with the incremental processing hypothesis.The nature of the unambiguous case cue is such that it prevents misinterpretation and therefore no reanalysis is required later (Schlesewsky et al., 2003).The salience of the case cue in our study reduces the effort for children to process this morphological information and to decode its functional meaning.The cue cost of the distributed number cue (Slobin, 1982) in addition to the initially ambiguous NP in the number condition, we argue, results in a higher computational load for preschoolers when confronted with object-initial sentences.
This leads us to the cognitive demands placed on children during processing of SVO and OVS sentences.In what follows, we will summarize the results for the executive functions associated with the mastering of syntactic processing, namely working memory capacity and cognitive control.In the offline data, better working memory skills capacity, as measured by a forward digit span task, enhances correct interpretation of the sentences in the case condition.Importantly, this effect was independent of the word order, as it was present in SVO as well as in OVS sentences.In the online data, however, the working memory capacity is positively related to more looks to the target across all conditions.In contrast to the impact of working memory capacity, higher cognitive control skills lead to increased performance specifically in the OVS sentences disambiguated by the number cue for both, online and offline data.
These results are in line with previous reported findings by Höhle et al. (2016) in disentangling the modulation of syntactic parsing by executive functions, namely working memory and cognitive control, in preschoolers.Their results point toward a general impact of working memory abilities on sentence processing but a specific effect of higher cognitive control skills on non-canonical sentence processing.
Let us first discuss the effects of working memory capacity: As previously shown by several studies (Felser et al., 2003;Kidd, 2013;Kidd et al., 2018;Roberts et al., 2007), higher working memory capacity among children is associated with a general enhancement of complex sentence comprehension.In our eye-tracking data, greater working memory capacity is associated with a higher proportion of looks to the target when processing sentences across all conditions.We suggest that both case marking and number information have to be stored for the computation of syntactic dependencies in the respective OVS/SVO sentences as a requirement for successful comprehension.The processing of the number cue may exert an even higher working memory load because of its distributed nature.However, when we consider both the online and offline modality, a different picture emerges: in the offline data, greater working memory abilities are associated with the facilitation of correct sentence-picture matching, in particular for sentences unambiguously marked by the case cue of both SVO and OVS word order.Thus, we speculate that 4;6-year-old children with greater working memory skills capacity are more readily able to overcome their subject-first expectation only when parsing the unambiguous case marking on the first NP and recall the specific linguistic information that is needed before selecting the corresponding picture, which is also in line with the incremental processing account.In this regard, higher working memory capacity may be associated with the ability to select among competing alternative interpretations, but only when they encounter unambiguously case-marked cues.Children at the same age with lower working memory capacity may not be able to overcome their strong parsing expectations yet, and the offline interpretation of the case and number cue remains challenging for them, independently of their working memory resources.
In contrast, correct interpretation of sentences disambiguated by the number cue is associated with cognitive control abilities.In both our offline and online results, children with higher cognitive control skills processed and interpreted object-initial sentences disambiguated by number more accurately.It appears that cognitive control contributes to disregarding the initial information (the misinterpreted case information on NP1) and the subsequent misleading parse upon the arrival of conflicting number information.Together with processing a later arriving disambiguating cue, cognitive control is especially relevant for the number condition in OVS interpretation.This is consistent with two arguments that were put forward before.First, preschoolers rely on incremental sentence processing and are rarely successful in overriding their initial wrong interpretation and thematic role assignment (Choi & Trueswell, 2010;Huang et al., 2013;Omaki et al., 2013;Trueswell et al., 1999).Second, specifically more distributed cues (as the disambiguation point and distribution in the number condition) add to these cognitive demands in children (Lindner, 2003;Slobin, 1982).In addition, unlike case marking, number marking is not directly informative with regard to the target thematic role that needs to be reassigned (Fodor & Inoue, 1994).While the first argument would apply to an advantage of higher cognitive control for all OVS sentences, the second argument particularly distinguishes the cognitive control efforts of the number from the case cue in our study.We were able to show that the specific contribution of cognitive control skills for OVS interpretation disambiguated by number agreement already at an age of 4;6.While case-marking appears to be processed more consistently than number by preschoolers, also number information alone can be processed, and this is especially the case for children with higher cognitive control.However, as mentioned in the introduction, the specific contribution of inhibitory and cognitive flexibility abilities to complex sentence processing are hard to disentangle at the moment and the flanker task used in the current study might not be informative in this regard.While the task can be argued to tap into both executive functions to a certain extent, task performance might reflect more the ability to resolve cue conflict as it is meant to assess the capacity to "suppress irrelevant information" in tasks (Mazuka et al., 2009).For ambiguity resolution, Woodard et al. (2016) could show a meaningful interaction with performance in a go/no-go task in children.Their task might target inhibition skills as inhibiting conflicting stimulus responses, while a Stroop task has been convincingly argued to tap into the ability to resolve conflicting representation (Novick et al., 2005).Therefore, if our conclusion that the Flanker task taps into the ability underlying also the resolution of mislead parses is on the right track, we suggest for further research to compare all three task performances with children's ability to recover from gardenpaths in order to disentangle the role of inhibition skills from cognitive control or conflict resolution.
We would like to address three more interesting thoughts for future research.First, regarding the number condition in our study, NP1 and NP2 differed in number and were counterbalanced to achieve a manipulation in which subjects and objects occurred equally often in singular as well as in plural.We argued that lower correct interpretation and higher cognitive effort generated by the number condition could be attributed to its more complex structure.There is the possibility that children achieve lower accuracy scores for those stimuli that presented NP1 in plural rather than in singular or for those that presented the subject NP in plural rather than in singular, as also discussed by Stegenwallner-Schütz and Adani (2021).Hence, number marking of NP1 might influence the degree of reliance on a subject-first strategy in children.Further investigation of this phenomenon may lead to a better understanding of the number feature's influence on the processing and interpretation of object-initial sentences.Second, in our current stimuli, we systematically manipulated and tested morphosyntactic cues, and thus neutralized prosodic cues (see Appendix B, Supplemental Materials, for acoustic analyses and for an example of the absence of pitch differences between OVS and SVO sentences regarding NP1) as a common procedure (e.g.Lidzba et al., 2013;Schipke et al., 2012;Strotseva-Feinschmidt et al., 2019).We are aware that a topicalization prosodic contour for OVS sentences would be more accurate regarding conversational realities as focus information interacts strongly with prosodic information as, for instance, pitch accent.Facilitatory effects of prosody could be shown for adults (Wang et al., 2011) as well as for children (Grünloh et al., 2011).Children aged five identified thematic roles better if the object-initial structure was accompanied with a contrastive intonation pattern (Grünloh et al., 2011).More experiments are needed to investigate the facilitatory contribution of prosodic information as used in conversations in regard to OVS processing for children.Third, a factor that has not been considered in our study is the socioeconomic status (SES) of the participating families.Huang and Hollister (2019) were able to show a strong association between children's SES and their correct interpretation of non-canonical sentences.Interestingly, Children's SES explained linguistic ability variations to a greater extent than executive function skills in their study.We believe future research should incorporate SES variables into analyses of language tasks.
In conclusion, the results of the present study support the predictions of the competition model and the incremental processing account for the correct interpretation of German object-initial sentences in children.Using a systematic manipulation of the case and number cue in SVO and OVS sentences, the case cue was shown to be more informative and accessible to 4;6-year-olds than the number cue.In this respect, a closer investigation concerning the number feature's impact of subjects (in singular or plural) will shed more light on the underlying processing mechanisms in the future.Moreover, our results could contribute to disentangling the modulation of syntactic parsing by executive functions in preschoolers.A general enhancement of sentence comprehension could be demonstrated for children with more advanced working memory capacity, while a structure-specific facilitation of the most demanding condition in our experiment, namely OVS disambiguated by number agreement, could be detected in children with greater cognitive control.More research is needed in the future to elucidate and explain the detailed contributions of executive functions to syntactic processing in children and language development in general.

Figure 1 .
Figure 1.An example of the visual stimulus used in the eye-tracking experiment.Screen resolution 1680 × 1050px.

Figure 2 .
Figure 2. Accuracy results of 4-year-olds (N = 27): mean accuracy for case vs. number disambiguation by word order condition OVS vs. SVO.Error bars denote ±2 standard error of the between-participant variation.

Figure 3 .
Figure 3. Correlation plots (of 4-year-olds, N = 27) showing the relation of the sum of correct digit span trials in total for each child with the mean accuracy in the two disambiguation conditions, left panel: case condition, right panel: number condition.Each point denotes one child plotted here with the values from each task.Note that both word orders (SVO/OVS) are included in each of the disambiguation conditions.The shades denote plus/minus one standard error based on single correlations and are not taken from the model output.

Figure 4 .
Figure 4. Average proportions of target looks separately for each word order (SVO -solid lines, OVS -dashed lines).For each time bin (which lasts approximately 17 ms according to the tracking rate of 60 hz) black rectangular bars indicate the time windows in which there is a significant difference of word order.

Figure 5 .
Figure 5. Average proportions of target looks separately for each word order (SVO -top panel, OVS -bottom panel).Disambiguation type is indicated by solid lines for case and dashed lines for number.For each time bin (which lasts approximately 17 ms according to the tracking rate of 60 hz) black rectangular bars indicate the time windows in which there is a significant difference of disambiguation (case vs. Number) for each word order separately.

Figure 6 .
Figure 6.Eye-tracking results of 4-year-olds (N = 27): mean target looking proportions for case vs. number disambiguation by word order condition OVS vs. SVO in the three time windows NP1, verb, and NP2.

Figure 7 .
Figure 7. Correlation plots (of 4-year-olds, N = 27) showing the relation of the Flanker_RT difference score (in ms) with the mean target looking proportion in all four conditions: panel 1: case-OVS, panel 2: number-OVS, panel 3: case-SVO, and panel 4: number-SVO.Each point denotes one child plotted here with the values from each task.Note that the Flanker_RT difference score was calculated as the difference of reaction times in the incongruent condition minus reaction times in the congruent condition of the flanker task.Hence, lower Flanker_RT difference scores are considered as a measure of higher cognitive control skills.The shades denote plus/minus one standard error based on single correlations and are not taken from the model output.

Table 1 .
Values of the fixed effects in the mixed-effects model for offline data.Z-and p-values of significant predictors are highlighted in bold font.

Table 3 .
Overview of the predicted and observed individual performance patterns.

Table 2 .
Values of the fixed effects in the nested linear mixed-effects model for offline data.Z-and p-values of significant predictors are highlighted in bold font.In order to interpret the 3-way interaction of Word Order x Disambiguation x Flanker_RT, we test the effect of Disambiguation separately for each word order as well as the interaction of Disambiguation and Flanker_RT.Therefore, we specified a post-hoc model with a nested effect of Disambiguation within SVO sentences (SVO Case coded as 1; SVO Number coded as −1, OVS Case, and OVS Number coded as 0), and a nested effect of Disambiguation within OVS sentences (OVS Case coded as 1; OVS Number coded as −1, SVO Case, and SVO Number coded as 0).The nested effects model was specified as follows: summary(m.3.nested<-glmer(correct ~ Word Order + Disambiguation in SVO + Disambiguation in OVS + Flanker_RT + Digit Span + Word Order: Flanker_RT + Word Order: Digit Span + Disambiguation in SVO: Flanker_RT + Disambiguation in SVO: Digit Span + Disambiguation in OVS: Flanker_RT + Disambiguation in OVS: Digit Span + (1 + Word Order | subject_id) + (1 | trial_id), family=binomial, data=d, control=glmerControl(optimizer="bobyqa")))

Table 4 .
Values of the fixed effects in the mixed-effects model for online data.T-and p-values of significant predictors are highlighted in bold font.

Table 5 .
Values of the fixed effects in the nested linear mixed-effects model for the online data.T-and p values of significant predictors are highlighted in bold font.In order to interpret the 3-way interaction of Window x Word Order x Disambiguation, we again test the effect of Disambiguation separately for each word order as well as the interaction of Disambiguation and Window.Therefore, we specified a post-hoc model with a nested effect of Disambiguation within SVO sentences and within OVS sentences (coded identically to the post-hoc model of the offline data).We used the following model specification: summary (m.2 h.interaction <-lmer(plt ~ Window + Word Order + Disambiguation in SVO + Disambiguation in OVS + Flanker_RT + Digit Span + Window: Disambiguation in SVO + Window: Disambiguation in OVS + Disambiguation in SVO: Flanker_RT + Disambiguation in OVS: Flanker_RT + Disambiguation in SVO: Digit Span + Disambiguation in OVS: Digit Span + (1 | trial_id), data=d1, REML=FALSE)).