Participant characteristics predicting communication outcomes in AAC implementation for individuals with ASD and IDD: a systematic review and meta-analysis

Abstract This meta-analysis examined communication outcomes in single-case design studies of augmentative and alternative communication (AAC) interventions and their relationship to participant characteristics. Variables addressed included chronological age, pre-intervention communication mode, productive repertoire, and pre-intervention imitation skills. Investigators identified 114 single-case design studies that implemented AAC interventions with school-aged individuals with autism spectrum disorder and/or intellectual disability. Two complementary effect size indices, Tau(AB) and the log response ratio, were applied to synthesize findings. Both indices showed positive effects on average, but also exhibited a high degree of heterogeneity. Moderator analyses detected few differences in effectiveness when comparing across diagnoses, age, the number and type of communication modes, participant’s productive repertoires, and imitation skills to intervention. A PRISMA-compliant abstract is available: https://bit.ly/30BzbLv.

participant characteristics, prior skill levels, and prior experience with AAC may inform intervention decisions.
Prior research syntheses addressing AAC have evaluated outcomes differentiated by a number of learner characteristics. One of those is chronological age (Ganz et al., 2011(Ganz et al., , 2014(Ganz et al., , 2017. Chronological age has been reported as predictive of early communication intervention success with respect to an improved developmental trajectory (Koegel et al., 2014). Others have reported that higher developmental age scores at intervention outset predicted better aided AAC outcomes (Sievers et al., 2018). The consensus is that intervention must start early to have the greatest impact on learning trajectory (Landa, 2007;Reichow, 2012); available data suggest that for non-verbal children, intervention prior to age 5-years increases the likelihood of spoken language (Koegel, 2000). For some children, the etiology of their developmental disability may influence communicative propensities. Wetherby and colleagues, for example, noted that children with ASD demonstrate a propensity to use of communicative acts for behavioral regulation (e.g., requests for objects and actions, protests), with more limited use of those associated with joint attention and social interaction acts (Wetherby, 1986;Wetherby & Prutting, 1984). Another possible variable influencing the efficacy of certain AAC intervention strategies involves the communication modes used prior to the communication mode selected for intervention. To date, this topic has received limited attention (Donato et al., 2018); however, prior reviews have reported that approximately half of studies implemented AAC modes with which the participants were not previously familiar (Biggs et al., 2018;Gevarter & Zamora, 2018). With respect to alternative communication modes, Reichle et al. (1991) speculated that the cognitive load demands for early sign production are greater for manual signs than for aided AAC. This is based on the fact that signs must be recalled in order to produce them, unlike graphic symbols, which are typically offered in an array of available symbols during intervention for beginning communicators. Recognition memory is generally recognized as having a lower cognitive demand than recall memory. This may indicate that individuals who used symbolic AAC forms prior to intervention will be more responsive to AAC instruction than those who have no prior experience with symbolic forms.
Some learners may engage in already acquired skills that are facilitative of communication skills. It is possible that the presence (or the absence of these skills) will better enable interventionists to customize comprehensive intervention strategies. Reichle et al. (2021) characterized a pivotal skill as one "that, when acquired, results in collateral improvements in, or acquisition of, other non-targeted skills for the same individual" (p. 2). Pivotal skills relevant for individuals with complex communication needs are abundantly described in the applied research literature. For instance, Fey et al. (2013) reported that children who played with many different objects made greater gains in spoken vocabulary. Communicative repertoire size, such as the number of words used by the individual prior to intervention, may be associated with intervention success. Participants who had larger productive vocabularies at the study's onset made greater gains in speech and AAC use subsequent to intervention (Vandereet et al., 2011). Ganz et al. (2014) reported significantly better outcomes for participants using speech at intervention outset compared to those having little functional speech; however, this investigation did not examine the degree to which more or less speech impacted the outcomes. Further analyses, with a larger and more current literature base, would allow researchers to better determine the role that prior productive vocabulary repertoire plays in responsiveness to intervention. Some learner skills, such as competence in imitation, can enhance communication production in children with ASD and complex communication needs; good imitators may be more likely to acquire vocal/verbal skills and require less intensive intervention strategies (Ingersoll, 2010;Mazaheri & Soleymani, 2019;Yoder & Layton, 1988). Toth et al. (2006) reported that initiating joint attention and imitation were associated with language skills at age 3-4 years; however, for the most part, interventionists have not described learners' joint attention status at the onset of communication intervention. Determining the extent to which skills such as imitation impact AAC learning could better enable clinicians' selection of treatment approaches and intensity.
The variables that we have addressed are supported by a number of theoretical approaches to language acquisition. Operant behavioral theory (Skinner, 1957) suggests that learning a language is similar to learning any new skill that is facilitated via observation, imitation, repetition, positive and negative reinforcement, and punishment. In this theory, joint attention would be considered a form of observation. Few investigators have incorporated information about the status of a learner's joint attention competence at the outset of intervention. Imitation can represent either a direct prompting strategy or a more indirect prompting strategy if the modeled behavior was not directed specifically to the learner. An additional explanation for the emergence of communicative behavior is that initial desired vocal approximations produced by the learner are reinforced by their parents and then improve as a result of others reinforcing successively better approximations while, at the same time, limiting reinforcers and generally making consequences less desirable for less socially acceptable communicative attempts. Typically, an operant behavioral theory has been applied to spoken word development although it is applicable to other communicative modes. With respect to multiple communicative modes, matching theory (Herrnstein, 1961) which is based on behavioral principles, suggest proposes that when two different behavior topographies are available to the learner, they will select the one that offers optimal reinforcement.
Vygotsky's social interactionist theory (Vygotsky & Cole, 1978) proposed that language has a social origin. These social interactions allow the development of higher cognitive functions that include language. Both behavioral operant and social interaction theory suggest that experience with others is critically important in language acquisition. This latter point is supported by the longitudinal work of Hart and Risley (1995) who reported that adult responsivity and overall frequency of communicative behavior directed to learners has a relationship to early language outcomes; hence, being older chronologically would suggest a larger number of experiences which has been shown (as we have pointed out) to be associated with language acquisition. A higher developmental level would suggest that the individual had benefited from the greater number of social experiences in learning increasingly more sophisticated communicative skills.
Vygotsky's theory spawned the Zone of Proximal Development (ZPD; Vygotsky & Cole, 1978), which has direct implications for interventionists. ZPD can be described as an area of learning that occurs when a person is assisted by a teacher or peer with a higher skill set and speaks to the importance of considering efficient prompting strategies in establishing new communicative behavior. Development can be described as the person's actual performance combined with how susceptible the learner is to types of assistance available, the manner in which the assistance is sequenced, the flexibility or rigidity of previously formed stereotypes, and the learner's collaboration. In this theory, the importance of choosing optimal prompts and the sequencing and fading of those prompts becomes central to generating evidence supporting the theory.
A closely related theory is Language use in interaction (Clark, 1996;Clark & Krych, 2004;Higginbotham et al., 2007). This theory is based on the premise that language is acquired through interaction between the learner and a caregiver and can only be improved as a result of practice. Language Representation Theory (Todman & Alm, 2003;Wray & Perkins, 2000) and Multimodality Theory (Goldberg, 2006;M€ uller & Soto, 2002;Soto et al., 2006) also support consideration of a variety of viable message modalities for early communicators in learning that is inherently interactive and collaborative. Finally, Social Learning Theory (Bandura & Walters, 1977) suggests that learning takes place in a social context and can occur either vicariously or via direct instruction. This theory expands on traditional behavioral theories, in which behavior is governed by placing greater emphasis on the roles of various internal processes in the learning individual. As such, it has promoted learning strategies that assist in addressing generalization as part of acquisition rather than a phenomenon that emerges after acquisition.
Most of the theories that we have addressed would support an association between chronological age and increasing experience. Developmental age is associated with products resulting from taking advantage of accumulated experiences over time. Participant characteristics, specifically age, and disability were considered by Ganz and colleagues (2011). These investigators reported larger effect sizes for preschool-aged participants (5-years-old and under) compared to older participants. Imitation and joint attention represent acquisitions of skills that enable a learner to demonstrate less direct forms of prompting to learn new things and further boost a learner's communicative acquisitions. Furthermore, the theoretical approaches discussed here either state or strongly imply that attention to prompting strategies and their sequencing can have a substantial impact on communicative acquisitions. Other variables that we have selected for scrutiny are supported by our numerous years of educational experience serving learners with complex communication needs. For example, learners who come to the task of learning augmentative communication skills who are more prolific in their existing vocabulary skills tend to progress more quickly. Additionally, all these variables have been hypothesized as mediators and moderators in numerous experimental investigations.
Previous single-case meta-analyses are rapidly becoming dated and require updating, particularly given the recent increase in articles reporting high-tech AAC studies (Ganz et al., 2017). Some of the prior systematic reviews were limited to only participants with ASD or IDD, rather than both. A more comprehensive synthesis would permit a finer-grained investigation of the impact of pivotal skills and participant characteristics, which may better inform individualized use of outcome data.

Strategies for examining magnitude of effects
Meta-analysis is used in many areas of social science in order to summarize findings regarding intervention effects, to investigate the extent of heterogeneity in intervention effects from different studies, and to identify factors that systematically predict, or moderate, the magnitude of effects (Borenstein, 2009;Pustejovsky & Ferron, 2017). In the present study, meta-analytic tools were implemented to investigate the magnitude of effects of AAC-based interventions and examine potential moderators of those effects.
A key decision in meta-analysis is what effect size metric to use because this choice defines the scale by which intervention effects from different studies are compared and contrasted. A wide range of effect size metrics have been proposed for meta-analysis of single case experimental designs, each of which has strengths and limitations Pustejovsky & Ferron, 2017). Prior meta-analyses addressing AAC (Flippin et al., 2010;Ganz et al., 2012Ganz et al., , 2014Ganz et al., , 2017 have used indices from the family of non-overlap measures, which describe effect magnitude in terms of ordinal comparisons between data points in different phases. Non-overlap measures are widely used and appealing because they do not rely on distributional assumptions about the outcome measurements (e.g., normality) that may be inappropriate for data from single-case designs. However, many non-overlap measures lack stable parameter definitions (Pustejovsky, 2019) and suffer from range restrictions, which limits their utility for distinguishing between data series for interventions strong enough to produce few overlapping outcomes (Pustejovsky, 2019;Wolery et al., 2010). Another approach is based on parametric effect size measures, which describe intervention effects for each case in terms of features of the outcome distribution (e.g., mean levels and standard deviations). Parametric effect sizes have several strengths, including their ease of interpretation and wellunderstood statistical properties. Parametric measures are limited in that they involve assumptions, such as lack of time trends and absence of auto-correlation, that are not always reasonable for some outcomes examined in single-case experimental designs.
Because no single effect size metric is always suitable for meta-analyzing single-case experimental design research, authorities have recommended that meta-analyses be conducted using multiple metrics, in order to gauge the sensitivity of findings to the choice of effect size (Kratochwill et al., 2013). Consequently, this study used one non-overlap effect size, Tau(AB) , and one parametric effect size, the log response ratio (Pustejovsky, 2015;2018). 1 We selected these metrics because of their complementary 1 Originally a third effect size metric, the between-case standardized mean difference was planned (BC-SMD; Hedges, 2013;Hedges et al., 2012;Pustejovsky et al., 2014). The primary advantage of the BC-SMD is that it expresses effect size magnitude on a scale that is comparable to the standardized mean difference that would be estimated from a between-group design conducted with the same population of participants, same intervention, and same outcomes. BC-SMDs also have several limitations, including that (a) they aggregate outcomes across participants and thus might conceal participant-level variation (Kratochwill & Levin, 2014); (b) their estimation methods are only available for multiple baseline, multiple probe, and treatment reversal designs that include at least three unique participants (Valentine et al., 2016); and (c) they are based on parametric assumptions that may not be appropriate and reasonable for some outcome data (Shadish et al., 2015). Based on visual inspection of outcome data graphs for this study, the modeling assumptions of the BC-SMD were determined to be inappropriate for the bulk of the included studies, due to the fact that studies predominantly used frequency counts or percentage measures of behavioral outcomes, for which modeling assumptions were implausible; thus, the analysis based on the BC-SMD was not executed.
strengths. Relative to other non-overlap measures, Tau(AB) has a stable parameter definition and uses the data efficiently because it is based on comparisons between every possible pairing of an observation from the treatment phase with an observation from the baseline phase; however, like other non-overlap measures, Tau(AB) still has limited sensitivity. Log response ratio is a parametric measure that quantifies intervention effects in interpretable terms, based on proportional change in the level of the outcome, making it suitable for behavioral outcome data in the form of counts or percentages (Pustejovsky, 2015); however, log response ratio is not appropriate for data series that have zero or near-zero levels of behavior during baseline (Pustejovsky, 2018). Both effect size measures have known sampling distributions when the data do not exhibit auto-correlation, and can be meta-analyzed using methods that are robust to the possibility of auto-correlated data series.

Aims
The purpose of this meta-analysis was to examine participantrelated factors that may influence the effects of AAC-based interventions assessed in single-case experimental design studies, and to report the differential effects of these potential moderators on communication outcomes compared across ages, communication mode used prior to intervention, number of words used by the participant prior to intervention, and imitation use prior to intervention. Such information will allow for greater individualization and personalization of AAC supports for learners with ASD and/or IDD, based on participant characteristics and prior skills and experiences.
This study was part of a larger, comprehensive systematic review that examined both group and single-case research studies of AAC interventions for individuals with ASD and/or IDD. Group design studies reported only aggregated participant characteristics and average outcomes, which did not permit examination of individual-level factors; thus, the present study focused only on findings from single-case studies in order to allow for examination of individual participantlevel factors associated with communication outcomes. The following specific research questions represent the focus of the current investigation:

Method
This manuscript reports data that are part of a larger, comprehensive systematic review (PROSPERO registration: CRD42018112428). The data reported here represent a subset of the larger project and included only single-case experimental design studies that met inclusion criteria specific to this manuscript. Figure 1 depicts the literature search and screening process, which took place between 2018 and April 2020.

Literature search
The search was conducted by a research librarian in Academic Search Complete, ERIC, PsycINFO, Conference Proceedings Citation Index -Social Science & Humanities (Web of Science), and Proquest Dissertations & Theses Global. Keywords used were related to AAC, communication and behaviors outcomes, and persons who experience ASD/IDD with complex communication needs: [((augmentative or alternative) within one word (w1) communicat Ã ) or "sign language" or manual sign Ã or speech-generating device Ã or SGD or "voice output communication aid" or VOCA Ã or PECS or "picture exchange communication system" or AAC or "visual scene display" or "functional communication training"] AND [(down Ã w1 syndrome) or ((develop Ã or intellectual) w1 (delay Ã or disabil Ã or impair Ã )) or autis Ã or retard Ã ]. A total of 7,384 documents found in the search were organized using the Rayyan web platform for systematic reviews (Ouzzani et al., 2016).

Inclusion/exclusion criteria
All documents were screened for inclusion/exclusion at each of four stages; title/abstract review, full-text review, basic design quality standards review, and dependent variable screening.

Title/abstract review criteria
Title/abstracts screening inclusion criteria were the following: (a) involved an AAC intervention, (b) included individual data for at least one participant with ASD or IDD, (c) assessed communicative outcomes (i.e., AAC [e.g., speech-output communication aid, exchange-based communication system, sign language], verbalizations [i.e., spoken words or word approximations], or paralinguistic communication [i.e., gestures, body language, facial expression]), (d) utilized a single-case experimental design, and (e) was reported in English. Titles that clearly did not meet these criteria were excluded from further screening. Documents potentially meeting inclusion criteria at the title and abstract stage (n ¼ 1758) continued to the full-text screening.

Full-text review criteria
Full-text inclusion criteria required that the study (a) was reported in English; (b) included individuals with an intellectual delay or other developmental disability(ies) as reported in the primary studies, such as autism spectrum disorder (ASD), intellectual disability (IDD), or other developmental disabilities with co-occurring complex communication needs, mental retardation, or cognitive disability who received instruction; (c) included the results of aided AAC (e.g., from low-mid and hightech applications, nonexchange-or exchange-based systems, speech-generating devices) or unaided AAC (e.g., natural gesture, manual sign, sign language, sign system) to supplement or replace conventional speech for participants; (d) used a single-case experimental design; and (e) assessed communicative or communicative and challenging behavior outcomes. The focus was primarily on ASD and/or intellectual delay. Consequently, we excluded studies in which participants had a sensory impairment as a primary condition who did not have autism or a level of cognitive impairment commensurate with that influencing the need for a complex communication needs. For example, we chose not to include populations who used sign as their natural language. Documents meeting inclusion criteria in the full-text stage (n ¼ 547) were screened against basic methodological quality standards for single-case experimental design.
Basic single-case experimental design methodological quality standards Each study was screened using the What Works Clearinghouse standards for single-case experimental designs (U.S. Department of Education, Institute of Education Sciences, What Works Clearinghouse, 2017). These criteria included (a) the study demonstrated a systematically manipulated independent variable, (b) the document reported inter-observer agreement (IOA), (c) the document reported at least 20% of IOA data points across baseline and intervention phase, (d) the document reported IOA scores of at least 80% or .60 kappa, (e) the study design included at least three phase changes in the graph, and (f) the study design included at least three data points per baseline and intervention phase, or at least four data points per intervention phase for alternating treatment designs. A total of 257 studies met these criteria and were retained for further screening.

Dependent variable and additional specific criteria screening
The researchers conducted a review of all studies meeting criteria at this stage (n ¼ 257) and indicated eligible dependent variables to be used during raw data extraction. A total of 176 documents included an eligible dependent variable. Finally, articles were further screened based on the criteria for inclusion in the meta-analysis syntheses specific to this manuscript, which included (a) special education-eligible participants (age less than 22 years), (b) data that could be extracted for communication-related dependent variables, and (c) interventions in which communication behaviors were the primary focus. After excluding ineligible articles, participants, intervention conditions, and dependent variables, 114 studies with 338 participants were available for analysis.

Variable coding and extraction of descriptive information
From the full set of included studies, information extracted included (a) participant diagnosis (ASD: including autism spectrum disorder, autistic disorder/autism, high-functioning autism, or pervasive developmental disorder with or without ID; IDD: intellectual disability or mental retardation requiring an IQ score of less than 70 AND commensurate deficits in adaptive behavior overall/composite); (b) participant chronological age (0-3 years-old, 4-5 years old, 6-10 years-old, 11-14 yearsold, and 14-22 years-old); (c) number of words/symbols produced prior to intervention (categorized as none, 1-10 words, over 10 words); (d) communication mode prior to intervention (verbalization, natural gestures, natural gestures and verbalization, natural gestures and vocalization, vocalization, manual sign language, low-tech aided AAC, mid-to-high-tech aided AAC, other, and not reported); and (e) imitation use prior to intervention (categorized as gesture imitation, vocal/verbal imitation, and both gesture and vocal/verbal imitation).

Outcome extraction
For every included study, raw outcome data were extracted from single-case experimental design graphs provided in primary study reports. Only the participants, interventions, and outcomes that met the inclusion criteria were coded. Data were extracted using Engauge Digitizer (Mitchell et al., 2017; http://markummitchell.github.io/engauge-digitizer), a free, opensource computer program for converting electronic images into numerical data. A Co-PI trained four graduate assistants in extracting raw data. Data for baseline phases and intervention phases were included.

Effect size calculations
Because of the variety of available effect size metrics for single-case experimental designs and the limitations associated with each, two distinct effect sizes were selected for the present meta-analysis: Tau(AB) and the log response ratio. Tau(AB) is an effect size in the family of non-overlap measures , which quantifies effect size magnitude in terms of the probability that a given data point in the treatment phase will be an improvement over a given data point from the baseline phase. It ranges from À1 (complete nonoverlap in the direction of detrimental effects) to 1 (complete non-overlap in the direction of therapeutic effects), with a value of zero corresponding to no difference in the distribution of outcomes from the phases being compared. Tau(AB) is appropriate for data without strong time trends. Tau-U is a variation of Tau(AB) that makes an adjustment for baseline time trends. Both Tau(AB) and Tau-U were calculated and found to be strongly correlated. Consequently, subsequent analysis was based on Tau(AB) because of its simpler interpretation.
To complement the limitations of Tau(AB), a parametric effect size measure, the log response ratio (Pustejovsky, 2018) was utilized. Log response ratios quantify effect size in terms of the proportional change in the average level of an outcome from one phase (or condition) to another. Response ratios are also used in meta-analysis of group designs (e.g., Hedges et al., 1999). The drawbacks of this effect size are that it does not account for time trends and that it is not appropriate for data series that have zero or near-zero levels of behavior during baseline (Pustejovsky, 2018). This is because percentage changes are undefined if the behavior is totally absent during baseline; similarly, when the baseline level is non-zero but very small, the magnitude of percentage changes are extreme and very sensitive to small changes in baseline level.
Tau(AB) and log response ratio estimates were calculated using the SingleCaseES package (Pustejovsky & Swan, 2019) for the R programming environment. The "null" standard error estimator was applied for Tau(AB) and the bias-corrected estimator for the form of log response ratio appropriate for outcomes where increase is desirable. For both effect sizes, effect size estimates were calculated for pairs of adjacent phases in multiple baseline, multiple probe, and treatment reversal designs. For alternating treatment designs, effect sizes comparing a specific intervention condition within an alternating treatment phase to the preceding baseline phase was calculated. For data series that included more than one A-B contrast, we aggregated effect size estimates across contrasts (Pustejovsky & Ferron, 2017) prior to meta-analysis. To account for the limitations of log response ratio, data series were excluded where the baseline data were all at or near zero. As a result, the log response ratio effect size estimates could only be computed for 239 participants (72% of all included participants) from 93 studies (82% of included studies).

Summary meta-analysis Publication bias
There is growing concern regarding potential for publication bias in syntheses of single-case experimental design studies (e.g., Shadish et al., 2016;Sham & Smith, 2014;Tincani & Travers, 2019). We addressed publication bias by searching for and reviewing both published and unpublished studies, including grey literature. In meta-analyses of group design studies, it is common to also use graphical diagnostics such as funnel plots and statistical tests (e.g., Egger et al., 1997). However, these diagnostics are designed for the group design literature and may not be suitable for application to single-case experimental design research, where the processes that lead to publication bias are likely to be driven by factors such as visual determinations of experimental control and functional relationships rather than by the statistical significance of results (Shadish et al., 2015). Lacking better alternatives, we followed the precedent of other meta-analyses of single-case experimental design (e.g., Heyvaert et al., 2012) and evaluated publication bias using funnel plot diagnostics and tests for differences in the magnitude of effects between published and unpublished studies.

Moderator analyses
Primary research questions centered on factors that might moderate the effects of AAC interventions for individuals with ASD and/or IDD. To address these questions, multi-level meta-analysis regression models, using separate intercepts (i.e., indicator variables) for each level of the moderator were applied. In the main analysis, separate models were estimated for each potential moderator variable. In the Supplementary materials, multi-level meta-analysis regression models that control for multiple moderator variables are reported. Just as in the summary meta-analysis, we conducted separate analyses for each effect size metric ( Tau[AB] and log response ratio) using multilevel meta-analysis models that distinguished variation at the study level, participant level, and effect size level, with robust variance estimation to insure against mis-specified assumptions were implemented.

Inter-rater reliability
Four raters with doctorates in special education or related fields reviewed reliability on 100% of documents for title/abstract, 30% of included documents for full-text, moderator coding, and raw data extraction stages. Practice documents were randomly chosen for training with all raters until 80% agreement for each rater was reached for every stage. Discussion and retraining were conducted when agreement was below 80%. Each rater independently coded and then discussed any disagreements between two raters and arrived at a consensus agreement. The authors calculated percentage agreement by dividing agreements by agreements þ disagreements multiplied by 100. IRR scores on title/abstract, full-text stage, moderator coding, and raw outcome data extraction stage were 93%, 93%, 92%, and 98%, respectively.

Participant characteristics
A total of 114 studies published between 1978 and 2020 were included in the meta-analysis of Tau(AB). Studies included a total of 330 participants with ASD/IDD and a total of 767 Tau(AB) effect size estimates. Participant ages ranged from one year to 21 years, with a median age of 5 (interquartile range: 4-9.75 years). The majority of participants were diagnosed with ASD (n ¼ 224); fewer participants were diagnosed with IDD (n ¼ 83) or both ASD and IDD (n ¼ 23). Notably, reporting of participant race, ethnicity, or home language environment was infrequent. Participant race was described for 32% of participants, ethnicity for 6% of participants, and home language for 13% of participants. Supplementary

Instructional characteristics
We investigated the instructional characteristics of included studies in detail as reported elsewhere (Ganz et al., 2021). A brief overview is provided here considering the potential influence of intervention-related characteristics on outcomes (see Supplemental Materials, Table S3). Regarding instructional features, most of the studies involved the use of reinforcement (93%), systematic arrangement of the environment (90%), and prompt fading (90%). Many of the studies implemented modeling (66%) and verbal prompts (67%). About half of the studies involved physical prompts and preference assessments, and few involved graphic prompts (5%); that is, most of the studies involved some type of prompting or other behavioral strategy. Furthermore, we coded studies as using the following naturalistic versus more functional-behavioral approaches; studies were coded as being mostly one or the other, not both, for each approach. In most cases, studies exhibited more functional-behavioral than naturalistic approaches: 85% were interventionist-led, 90% relied on massed versus dispersed instructional trials, 82% used contrived versus embedded activity contexts, 91% used one-on-one instruction, 36% used limited versus varied teaching stimuli, and 47% used controlled instructional environments.

Communication modes
Prior to intervention, participants' most commonly used multiple modes of communication (48% of included participants), verbalization (15%), or natural gestures (10%); communication modes were not reported for 14% of included participants (see Supplementary Materials, Table S2 for further details). During intervention, participants most commonly used aided AAC exclusively (57% of participants), followed by unaided AAC exclusively (22%), or both aided and unaided AAC (18%). Communication modes not involving vocalization or verbalization were more common (71% of participants) than those with vocalization or verbalization (36%). See Supplementary Materials, Table S4 for further details about communication modes and characteristics.

WWC design ratings
The included studies were comprised of 201 unique figures, each of which was assessed based on the WWC design standards. Overall, 91 of the figures (45%) met all WWC standards without reservations, 93 figures (46%) met all standards with or without reservations, and 17 figures (8%) did not meet at least one standard. Figures that did not meet standards were always part of studies that included other figures that met standards with or without reservations. Nearly all figures met standards without reservations with respect to manipulation of the independent variable, inclusion of at least three opportunities to detect a functional relation, and inter-rater agreement data. 91 of the figures met standards without reservations for inclusion of at least five data points per phase or condition, while 98 of the figures met standards with reservations for inclusion of at least three data points per phase or condition. Supplementary Table S5 provides further details.

Tau(AB)
Effect sizes were calculated for each AB phase contrast for each participant. Supplementary Figure S1, top panel, depicts the distribution of Tau(AB) effect size estimates, which was strongly skewed. Individual Tau(AB) estimates ranged from À1.00 to 1.00; study-level average effect sizes ranged from À0.17 to 1.00. Based on the multi-level meta-analysis, the overall average effect size was 0.72, 95% CI [0.67, 0.77]. Tau(AB) effect sizes exhibited substantial variation at the study-level (study-level SD estimate: 0.22), but less variation between participants nested within studies (participant-level SD estimate: 0.04) and no variation between cases nested within participants (case-level SD estimate: 0.00). Based on these estimates, and assuming a normal distribution of effects, about two thirds of study-level average Tau(AB) would be expected to fall within 1 SD of the average, or between 0.50 and 0.94. These estimates were consistent with the distribution of study-level average Tau(AB) values, two thirds of which fell within the range of 0.52 to 0.97.
To provide context for interpreting the Tau(AB) analyses (Vannest & Sallese, 2021), summary effects results were compared to those reported for a recent meta-analysis of hightech AAC for individuals with ASD or IDD (Ganz et al., 2017) and those reported by Parker et al. (2011). The weighted, corrected baseline interpretations from Ganz et al. (2017) were selected for comparison because these are most similar to the method used in the current meta-analysis and are a conservative approach. The overall distribution of Tau(AB) effects in the present study is within the lower 20-40% of ESs relative to the pool of studies reported by Ganz et al. (2017), and within the 50-75% quartile when compared to the pool reported by Parker et al. (2011). These comparisons should be considered cautiously, since the benchmark sources include populations and interventions the pool used in the current meta-analysis. Ganz et al. (2017) noted significant skewness in the data, as also noted in this pool (Supplementary Figure S1).

Log response ratio
The subset of data where log response ratio could be calculated included a total of 93 studies, 239 participants, and 516 effect size estimates. Supplementary Figure S1, bottom panel, depicts the distribution of log response ratio effect size estimates for the subset of studies and cases where it could be calculated. Individual log response ratio estimates ranged from À3.26 to 7.64, with a median of 1.40 (interquartile range: 0.68-2.78); study-level average effect sizes ranged from À0.21 to 6.72. Based on the multi-level meta-analysis, we found an overall average effect of 1.86, 95% CI [1.58, 2.13]. This average effect size corresponds to an increase in behavior of 541%, 95% CI [386%, 744%] from baseline to intervention. Log response ratio effect sizes were highly heterogeneous, with substantial variation between studies (study-level SD estimate: 1.16), little variation between participants nested within studies (participant-level SD estimate: 0.11) and substantial variation between cases nested within participants (case-level SD estimate: 1.12). Based on these estimates, and assuming a normal distribution of effects, about two-thirds of the study-level average log response ratio effect sizes would be expected to fall within 1 SD of the average, or between 0.69 and 3.02, which corresponds to percentage increases of between 100% and 1951%; thus, the average increase in communication was sizable and positive but the distribution of effects was highly variable.

Publication bias
Funnel plots were generated showing the distribution of Tau(AB) and log response ratio effect size estimates versus their standard errors (Supplementary Figures S2 and S3). These plots indicated that the effect sizes were highly heterogeneous, but did not indicate patterns that would typically be expected from publication bias; however, funnel plot diagnostics are not designed for use with meta-analyses of single-case experimental designs, and so the lack of such patterns here does not imply that the literature is free from publication bias.

Moderator analysis
Moderator analyses based on the Tau(AB) and log response ratio effect sizes are reported in Table 1 and Supplementary Table S5, respectively. Tables S6 and S7 in the supplementary materials include results based on meta-regression with controls for multiple moderator variables. Results from the multiple moderator meta-regression analyses were consistent with those based on the simpler analyses, which are subsequently described.

Diagnosis
The average effect sizes for participants diagnosed with ASD, those diagnosed with IDD, and those with diagnoses of both ASD and IDD were compared. For Tau(AB), average effect sizes were smallest for participants with both ASD and IDD, Tau

Age
For analysis, participant ages were grouped into five categories. For Tau(AB), average effect sizes for different age groups were not statistically distinguishable, robust F(4,29.1) ¼ 0.7, p ¼ .593. Tau(AB) effect sizes were 0.74 for 0-3 years-old, 0.72 for 4-5 years-old, 0.70 for 6-10 years old, 0.68 for 11-14 years-old, and 0.77 for 14-22 years-old. Findings were similar in the analysis based on log response ratio; average log response ratio effect sizes for different age groups were not statistically distinguishable, ranging from 1.70 for 11-14 year-olds (equivalent to a 448% increase) to 2.01 for 0-3 year-olds (a 646% increase).
Supplementary Figure S4 depicts the relation between effect size magnitude and participant age based on Tau(AB) (top panel) and log response ratio (bottom panel). A slight downward trend can be observed for the log response ratio effect sizes, but this trend is not statistically distinct from zero. It appears that participant responding is highly variable and that age is not a factor or disqualifier in determining whether a particular individual may respond to AAC intervention.

Communication mode used prior to intervention
Communication modes used prior to intervention were examined in two ways: based on the number of modes used and the specific combination of modes. The number of modes was not a statistically significant predictor of Tau(AB) effect sizes; average effects were similar for participants who used one, Tau(AB) ¼ 0.73; two, Tau(AB) ¼ 0.65; three, Tau(AB) ¼ 0.73; or four or more modes, Tau(AB) ¼ 0.69. In 16 studies (including 46 participants) where communication mode was not reported, effect sizes were somewhat higher, Tau(AB) ¼ 0.84, 95% CI [0.76, 0.91]. The pattern of results was similar for log response ratio effect sizes, where again, the number of communication modes was not significantly associated with effect size.
Combinations of communication modes used prior to intervention were also examined. Supplementary Figure S5 (left panel) depicts the distribution of Tau(AB) effect size estimates by prior communication mode, along with average effect size estimates for each mode. Average Tau Figure S5, right panel).

Word use prior to intervention
The number of participant words used prior to intervention were grouped into five categories. For Tau(AB), average effect sizes were similar across categories, with 0.73 for zero words used, 0.72 for 1-5 words used, 0.84 for 6-10 words used, 0.74 for 11-50 words used, and 0.67 for over 50 words used; differences between categories were not statistically distinguishable, robust F(4,10.9) ¼ 0.7, p ¼ .625. For 219 participants from 87 included studies, word use prior to intervention was not reported. In these studies, the average Tau(AB) was 0.71, 95% CI [0.65, 0.77]. Findings based on log response ratio were similar, with average effect sizes ranging from 1.58 for zero words used to 2.62 for 6-10 words used; differences between categories were not statistically distinguishable, robust F(4,5.9) ¼ 1.2, p ¼ .401.

Imitation use prior to intervention
Participants' imitation use prior to intervention was classified as gestural imitation (15 studies, 31 participants), vocal/verbal imitation (28 studies, 54 participants), and participants with limited imitation (11 studies, 15 participants). The bulk of included studies did not report sufficient information to determine participant imitation use (92 studies, 230 participants). Average Tau(AB) effect sizes for each category were not statistically distinct, robust F(2, 12.5) ¼ 0.2, p ¼ .832. Similarly, average log response ratio effect sizes for each category were not statistically distinct, robust F(2, 7.4) ¼ 1.1, p ¼ .391.

Discussion
This comprehensive systematic review and meta-analysis synthesized a large body of evidence, drawn from single-case studies addressing AAC interventions for children and youth with ASD and/or IDD. The population included approximately 67% individuals with an ASD diagnosis, 25% with IDD, and the remainder identified as having both. Participants were heavily skewed toward younger participants as approximately 80% were aged 10 years or under and approximately half were 5 years or younger. Average effects expressed as Tau and log response ratios indicated that AAC interventions were generally effective for individuals with ASD or IDD, particularly those aged 10 years and younger; however, the high degree of heterogeneity in effect sizes indicates that the strength of intervention effects was quite variable across participants.
Although benchmarks for effect sizes are commonly described in terms of small, medium, and large magnitude effects (e.g., Parker et al., 2011), such guidelines are inevitably arbitrary and easily taken out of context. We find it more useful and appropriate to interpret effect sizes by comparing them to distributions of effects in populations of other single-case studies that share some similarities. Across studies, the summary effects for Tau indicate that effects were within the lower 20-40% of effect sizes when compared with ESs for high-tech AAC interventions (Ganz et al., 2017) and within the 50-75% quartile of ESs for studies on special education and psychology interventions ; however, these interpretations should be considered with caution because the comparison sources are not perfectly aligned with the purposes of the current meta-analysis. One literature pool  drew primary experiments randomly from academic and behavioral interventions in the special education and school psychology literature related to learning and behavioral disabilities (not specific to communication or autism); another (Ganz et al., 2017) reflects articles from only high-tech aided AAC interventions.
Analyses here detected few differences between moderator categories for either Tau(AB) or log response ratio when comparing across diagnoses, ages, the number and type of communication modes the participants used prior to intervention, the number of words used by the participants prior to intervention, and imitation use prior to intervention. This is encouraging because it indicates that AAC interventions are generally effective for many individuals with ASD or IDD who have complex communication needs, regardless of their skills, characteristics, and experiences prior to AAC intervention. It suggests that instructional strategies may have been reasonably well customized to learners. Furthermore, we caution against interpretation of these outcomes in the establishment of minimal qualifications for service provision; in fact, these results suggest that services and AAC modes be determined in light of each individual's particular needs and preferences, and those of key stakeholders, such as educators and parents.
Tau(AB) analysis did indicate that there were some differences in outcomes related to the communication mode(s) that participants used prior to intervention. Effect sizes were higher for participants who used manual sign language, lowtech aided AAC, and mid-to-high-tech aided AAC, compared to those who used only vocalization, verbalization, and natural gestures prior to the target AAC intervention (Supplementary Figure S5). This may indicate that individuals who used symbolic AAC forms prior to intervention were more responsive to AAC instruction than those who had no prior experience with symbolic forms. Given that these participants were considered to require AAC, it is likely that any verbal or vocal skills previously acquired by the participants were below age-expected levels and were not meeting the participants' needs. Success with AAC modes prior to intervention may be predictive of future outcomes; stronger effects that were identified for participants who used manual sign or aided AAC may indicate that these individuals had stronger communication skills at the outset than those who relied on potentially ineffective verbal or vocal skills and natural or idiosyncratic gestures. This conclusion was also supported by the large number of participants reported to have five or fewer in-repertoire symbols prior to intervention.
Results did not provide compelling support for the suggestion that the cognitive load demands for early sign production are greater for manual signs, which require recall, than for aided AAC, which typically use graphic symbols displayed in an array during beginning intervention (Reichle et al., 1991). Recognition memory is generally recognized as having a lower cognitive demand than recall memory. Although prior experience and success with one or more AAC modes was supported by Tau effect size analysis to be related to AAC success; this relationship was not apparent using the log response ratio metric; however, it is important to note that in making this comparison there were a number of uncontrolled variables that could have influenced the comparison. For example, cognitive demands for recall increase with a learner's vocabulary size, but most participants in included studies had relatively small vocabularies.
A primary take-away message from the current investigation concerns the high degree of heterogeneity in effect sizes. Multi-level meta-analytic models provided estimates not only of average effects but also of the extent of variation in effect sizes at the study, participant, and contrast levels. There was a substantial degree of study-level variation in effect size for both Tau(AB) and log response ratio effect size metrics. This heterogeneity made it difficult to detect systematic differences based on participant characteristics. As a result, few of the hypothesized moderators were statistically significant predictors of effect size magnitude. Meta-regression analyses that controlled simultaneously for several participant characteristics (Supplementary Tables S6 and S7) did not sharpen the results, nor did the inclusion of several variables explain substantial amounts of variation in effect size. Features other than those investigated may be driving the variation in effects; thus, an important direction for further work is to identify systematic predictors of variation in effects across participants and studies.
Results differ somewhat from previous meta-analyses addressing AAC intervention. Similar to other's findings (Holyfield et al., 2017), the evidence base included a much larger number of participants at younger ages than at older ones. No differences were detected in effects as a function of chronological age. There was a marginally significant difference related to diagnosis for log response ratio, although not supported in models with multiple control variables. Consequently, available data suggest that individuals with either ASD and/or IDD of varying ages can be successful in learning to use AAC, regardless of their status with respect to early intervention; however, it is highly probable that learning trajectories benefit from early intervention. Consequently, while the authors strongly support the value of early intervention (Koegel et al., 2014), the analysis does not support the conclusion that chronological age or a particular diagnosis represents a condition to access, or to denial of, AAC tools and interventions. This conclusion is in agreement with conclusions drawn by Ganz et al. (2017). It is in contrast, however, with a meta-analysis finding that aided AAC was significantly more effective with preschoolers than with older individuals with ASD (Ganz et al., 2014), although this study used the Improvement Rate Difference effect size metric, which is influenced by procedural factors such as the number of observations in baseline and intervention phases (Pustejovsky, 2019). Furthermore Ganz et al. (2014) was less comprehensive, compared to the current meta-analysis, in that it excluded unaided AAC, excluded participants who did not have ASD, and summarized a pool of studies that is now dated, thereby excluding current technologies.
Most reviews have not reported on the number of communication modes used prior to intervention; however, some have investigated the match between type of AAC used prior to and type used during intervention. Gevarter and Zamora (2018) suggest that when participants had prior experience with a communication mode that was then targeted for intervention, this may be associated with better outcomes.
Although others have found that larger productive vocabularies prior to intervention were associated with greater gains during intervention, this outcome was not confirmed. The analysis of this variable was limited in that only onethird of the primary sources reported the number of words used by participants prior to intervention. Other investigators have suggested that individuals who have acquired imitation skills have an advantage during acquisition (Stone & Yoder, 2001;Yoder & Layton, 1988).

Implications for clinical practice
Generally, AAC is effective in improving communication outcomes with school-aged individuals with ASD/IDD who have complex communication needs. Given the heterogeneity of the effects of this meta-analysis, it is imperative that decision-making regarding service provision and AAC mode selection consider the needs of the individual with complex communication needs, foremost, and the preferences of that individual and key stakeholders (e.g., educators, parents). Such decisions should involve data from varied sources, including standardized assessment, checklists, observation, preference assessments, assistive technology evaluations, and interviews and questionnaires with input from multiple sources. It appears that individuals with prior experience with symbolic communication are likely to be more responsive to treatment; this suggests that individuals who are new to using AAC may require more intensive treatment in early stages. Furthermore, although there were fewer participants who were taught to use manual signs than aided AAC prior to intervention, they were not statistically distinct, indicating that matching AAC mode to participant needs and skills is critical. Our literature review did not uncover enough studies in which participant imitation and joint attention were evaluated prior to intervention to demonstrate significant distinctions between groups. Given that these are likely indicators of the need for more or less intensive intervention, practitioners are encouraged to include observational measures of these pivotal factors in making AAC intervention decisions.

Limitations and future research
This article provides the largest and most comprehensive meta-analysis on AAC for individuals with ASD/IDD to date; however, there are several limitations. Consistent with recent recommendations, multiple effect size metrics were applied that reflect the best available for these data, yet neither is perfect. One limitation of Tau(AB) relates to its loss of sensitivity when there is near or complete non-overlap of data points between baseline and intervention (Wolery et al., 2010). In such contexts, Tau(AB) cannot distinguish between smaller or larger changes in the level of the outcome; that is, there is no further differentiation of what effects are bigger than others, which can be seen in the distribution of effects with a large group near the ceiling of 1.0. This ceiling effect may have contributed to the lack of systematic moderating effects that were obtained. Log response ratios were also calculated, which do not have the same range restrictions as Tau(AB); however, there are limitations with the use of this metric as well. Log response ratio is inappropriate for data series that have baselines at or near the floor levels, which necessitated exclusion of a non-trivial number of data series. As a result, the analyses based on log response ratio excluded many studies that were included in the Tau(AB) calculations.
A further limitation concerns the assessment of publication bias. Our investigation of potential publication bias was not conclusive, due to the mismatch between the nature of single-case research and the assumptions of available statistical techniques for assessing publication bias. Because no methods exist specifically to assess publication bias in singlecase research, methods developed for the group design literature, where publication bias is likely to be driven by the statistical significance of results, were applied. In single-case research, it is more likely that publication biases are a function of visual inspection and assessment of lack of experimental control.
Another limitation was that our analysis focused on understanding how intervention effects were associated with characteristics of individual participants only, rather than also examining specific intervention components or other contextual factors. This limitation is especially important considering that the present review used broad inclusion criteria and therefore included a wide range of AAC interventions. Using the same set of evidence analyzed here, other work (Ganz et al., 2021) has examined how factors including instructional setting, instructional features, and use of behavioral intervention strategies might moderate intervention effects, but found few systematic predictors of effect size magnitude. Still, it is possible that variation in use of intervention components, including type of AAC, could explain some of the variation in the intervention effects that we have observed. Isolating such factors could then, in turn, provide a clearer picture of how AAC effectiveness varies depending on individual learner characteristics.
A final limitation of this review is that we encountered substantial missing information from primary sources, as illustrated by the large numbers of observations for which codable information was "not reported" (e.g., number and types of communication modes used by participants prior to intervention, number of words reportedly used by each participant). Furthermore, there are a number of critical potential moderators that we coded but were unable to analyze due to infrequent reporting by primary study authors. Joint attention has been a focus of recent research in language interventions (Kasari et al., 2006(Kasari et al., , 2008 could be a predictor of responsiveness to intervention. This missing information limited an analysis of how participant characteristics were related to effect size. Because researchers are not assessing joint attention status prior to intervention, it seems reasonable to infer that intervention researchers are not yet considering the potential value of joint attention as a facilitating variable in communication intervention.
Much information on participant characteristics and precursor skills was missing from original studies, which greatly impacted an interpretation of the variability in the effects; thus, one important emphasis of future research is for AAC researchers to begin including this information in research reports and to design larger group studies that experimentally investigate questions of differential effectiveness. In general, many aspects of participant description made it difficult to conduct planned moderator analyses. For example, a number of studies provided informal descriptions of cognitive landmarks rather than quantification of learner cognitive performance; that is, the participants were excluded from standardized or normed assessments. This was especially notable for participants who appeared to have significant communication skill delays, which is characteristic of the target population of persons with ASD and/or IDD who had complex communication needs. Descriptions of vocabulary comprehension skills at the outset of a study were rare and limited. This is particularly important in that comprehension of spoken language has been found to be related to AAC intervention outcomes. It would be useful if future researchers would report assessments of both communication comprehension and expression, as well as diagnostic assessments for this population.
The lack of quantitative information about communication comprehension levels prevented us from analyzing such data. The role that speech comprehension plays in learning to produce early spoken words and graphic symbols has been addressed by several authors (e.g., Brady et al., 2015;Drager et al., 2006;Romski & Sevcik, 1993). Evidence addressing the role that prior vocabulary comprehension skills may play in the acquisition of AAC skills is virtually non-existent (Elmquist et al., 2019); however, these data were missing from most articles included in the present synthesis. Many learners must begin AAC acquisition by establishing the relationship between AAC symbols and their referents by depending exclusively on contextual cues to derive meaning through the visual modality (Romski & Sevcik, 1988); thus, such comprehension limitations directly influence AAC symbol production learning (Romski & Sevcik, 1993). Consequently, comprehension becomes an important learner characteristic that likely influences learner performance. The reciprocal relationship between symbol production and speech comprehension remains underexplored (Johnston et al., 2012). If gains in speech comprehension result from aided AAC intervention, this relationship would be beneficial in closing the vocabulary gap between speech comprehension and symbol production for many children with neurodevelopmental disabilities and their typically developing counterparts (Johnston et al., 2012).
Given that 13% of the studies did not report communication mode used prior to intervention, that the number of words/symbols used prior to intervention was unreported for 66% of participants, and that most of the studies did not report precursor imitation skills, future investigations should improve their reporting of these variables. Future investigators could then evaluate the effectiveness of AAC interventions based on length of prior AAC use (Biggs et al., 2018). Furthermore, it may be that participants who used the same mode prior to intervention as was targeted during intervention perform better or learn more quickly; this is a potential direction of future investigation. Furthermore, the literature remains under-examined with respect to determining the influence of prior AAC use on acquisition, maintenance, or generalization (Ganz et al., 2014); quantity of words used prior to implementation (Biggs et al., 2018;Gevarter & Zamora, 2018), and imitation skills prior to the reported intervention (Biggs et al., 2018;Ganz et al., 2014;Gevarter & Zamora, 2018). Filling these gaps has the potential to shed significant light on the influence that participant characteristics may have on intervention outcomes.

Conclusion
Moderator analyses detected few differences in effectiveness when comparing across diagnoses, ages, the number and type of communication modes the participants used prior to intervention, the number of words used by the participants prior to intervention, and imitation use prior to intervention. Although it is possible that these variables do not directly influence outcomes, we suspect that with a more explicit description by the primary study authors, many of the variables may have some bearing on learner performance. For example, we know that intellectual level is apt to be associated with certain intervention strategies leading to success; however, in many investigations that we examined it was difficult to ascertain learner intellectual status. Much the same was true with respect to prior communicative mode use, or prior AAC use (including communication mode use). In many studies, specific symbols were implemented with little information addressing the criteria for their selection. For metaanalyses to be a useful research tool, investigators designing experimental studies must be more explicit, systematic, and precise in describing the status of a number of variables that may well moderate success. Until this becomes more pervasive, meta-analytic projects will have limited ability to identify factors that influence participants' outcomes.

Disclosure statement
No potential conflict of interest was reported by the author(s).