Automatic analysis of child speech (Knowles et al., 2018)

<div><b>Purpose:</b> Heterogeneous child speech was force-aligned to investigate whether (a) manipulating specific parameters could improve alignment accuracy and (b) forced alignment could be used to replicate published results on acoustic characteristics of /s/ production by children.</div><div><b>Method: </b>In Part 1, child speech from 2 corpora was force-aligned with a trainable aligner (Prosodylab-Aligner) under different conditions that systematically manipulated input training data and the type of transcription used. Alignment accuracy was determined by comparing hand and automatic alignments as to how often they overlapped (%-Match) and absolute differences in duration and boundary placements. Using mixed-effects regression, accuracy was modeled as a function of alignment conditions, as well as segment and child age. In Part 2, forced alignments derived from a subset of the alignment conditions in Part 1 were used to extract spectral center of gravity of /s/ productions from young children. These findings were compared to published results that used manual alignments of the same data.</div><div><b>Results: </b>Overall, the results of Part 1 demonstrated that using training data more similar to the data to be aligned as well as phonetic transcription led to improvements in alignment accuracy. Speech from older children was aligned more accurately than younger children. In Part 2, /s/ center of gravity extracted from force-aligned segments was found to diverge in the speech of male and female children, replicating the pattern found in previous work using manually aligned segments. This was true even for the least accurate forced alignment method.</div><div><b>Conclusions: </b>Alignment accuracy of child speech can be improved by using more specific training and transcription. However, poor alignment accuracy was not found to impede acoustic analysis of /s/ produced by even very young children. Thus, forced alignment presents a useful tool for the analysis of child speech.</div><div><br></div><div><b>Supplemental Material S1. </b>Summary of fixed-effects coefficients in the logistic regression models of %-Match between manually and force-aligned segments (Part 1).</div><div><br></div><div><b>Supplemental Material S2. </b>Summary of fixed-effects coefficients in the linear regression models of absolute duration differences between manually and force-aligned segments (Part 1).</div><div><br></div><div><b>Supplemental Material S3. </b>Summary of fixed-effects coefficients in the linear regression models of absolute onset differences between manually and force-aligned segments (Part 1).</div><div><br></div><div><b>Supplemental Material S4.</b> Summary of fixed-effects coefficients in the linear regression models of absolute offset differences between manually and force-aligned segments (Part 1).</div><div><br></div><div><b>Supplemental Material S5.</b> Summary of fixed-effects coefficients in the linear regression models of center of gravity differences between manually aligned, adult-trained force aligned, and child-trained force aligned conditions, as well as child age and sex (Part 2). </div><div><br></div><div>Knowles, T., Clayards, M., & Sonderegger, M. (2018). Examining factors influencing the viability of automatic acoustic analysis of child speech. <i>Journal of Speech, Language, and Hearing Research.</i> Advance online publication. https://doi.org/10.1044/2018_JSLHR-S-17-0275</div>