figshare
Browse
1/1
5 files

Robust neural tracking of linguistic units relates to distractor suppression

figure
posted on 2019-07-24, 12:52 authored by Yayue GaoYayue Gao, Jianfeng Zhang, Qian Wang

In a complex auditory scene, speech comprehension involves several stages: e.g., segregating the target from the background, recognizing syllables and integrating syllables into linguistic units (e.g., words). Although speech segregation is robust as shown by invariant neural tracking to target speech envelope, whether neural tracking to linguistic units is also robust and how this robustness is achieved, remain unknown. To investigate these questions, we concurrently recorded neural responses tracking a rhythmic speech stream at its syllabic and word rates, using electroencephalography (EEG). Participants listened to that target speech under a speech or noise distractor at varying signal-to-noise ratios (SNRs). The neural tracking at word rate was not as robust as the neural tracking at the syllabic rate. Robust neural tracking to target’s words was only observed under the speech distractor but not under the noise distractor. Moreover, this robust word tracking correlated with a successful suppression of distractor tracking. Critically, both word tracking and distractor suppression correlated with behavioural comprehension accuracy. In sum, our results suggest that a robust neural tracking of higher-level linguistic units relates to not only the target tracking, but also the distractor suppression.


Figure 1. A, Structure of the target/distractor mixture. Target syllables (black) were presented at a constant rate of 4 Hz by a male speech and constituted to bisyllabic words at 2 Hz. The distractor (grey, green and red) is either a female speech (green, the speech distractor) with random syllables at a constant rate of 3.2 Hz or a speech-envelope modulated speech-shaped noise (red, the noise distractor), with 3 SNRs (-6, -3, 0 dB) in each distractor type. The target sequence in each trial consisted of 5 to 7 random syllables followed by bisyllabic words lasting 10 s. The distractor stream overlapped with the whole target stream. B, Neural responses under the speech (left, green) and noise (right, red) distractor at SNRs of 0, -3 and -6 dB. The EEG spectrum is the averaged power over participants and channels. Stars indicate significant response peaks higher than the power averaged over a 0.8 Hz neighbouring frequency region (including the target frequency; ★, p < 0.005, bootstrap). The topographies show broad spatial distributions of normalized neural responses over SNRs under the speech and noise distractors. The black dots in the topographic plots show the top 10 channels with the largest normalized responses power (also see Methods), in further analyses.

Figure 2. The sensitivity to the SNR was detected by the slope of normalized responses power over SNRs (slopes of normalized powers, t-test against zero: ***, p < 0.001; *, p < 0.05; FDR-corrected) at the target-word rate (T-word, solid line), at the target-syllabic rate (T-syllable, dash line) and at the distractor-syllabic rate (Distractor, dots). Colourful dots indicate normalized responses power averaged over participants. Error bars indicated SEM. The grey line plots the normalized response power in each participant. The neural responses were averaged in the top 10 channels (also see Fig. 1B and Methods).

Figure 3. Correlation between neural responses at the target-word rate (T-word), at the target-syllabic rate (T-syllable) and at the distractor-syllabic rate (Distractor). A, The correlation was detected by the slope in a linear equation fitting over 6 conditions (2 distractor types * 3 SNRs), within participants, between each two of three neural responses. Bootstrap estimated the slopes of correlation within participants, yielding a mean slope (thick solid lines, correlated if slope> 0 and anti-correlated if slope < 0), 95% (dark grey) and 99.9% (light grey) one-tail confidence intervals (background shadow, significant if the area isolates from zero shown in grey dash dots). The light grey line plots the correlation in each participant. The neural responses were averaged in top 10channels (also see Fig. 1B and Methods). B. The topography shows the spatial distribution of each (anti-)correlation in panel A. Dots show channels with significant slopes against zero (all p < 0.05, one-tail bootstrap, FDR-corrected).

Figure 4. A, The behavioural performance. Error bars indicated SEM. B. The correlation between the behavioural performance and neural responses, within participants over 6 listening conditions (2 distractor types * 3 SNRs). Bootstrap estimated the slope of correlation within participants, yielding a mean slope (varying types of lines, correlated if slope > 0 and anti-correlated if slope < 0), 95% (dark grey) and 99.9% (light grey) one-tail confidence intervals (grey background, significant if the area isolates from zero shown in black dash dots). The light grey line plots the correlation in each participant. The neural responses were averaged in the top 10 channels (also see Fig. 1B and Methods).

Figure 5. The association between neural responses and behaviour performance in left (A-C) and right (D-F) hemispheres. A/D, The top 5 channels selected from left (A) and right (B) hemispheres were used in analyses. B/E. Neural responses in correct and incorrect trials at the target-word rate (T-word), target-syllabic rate (T-syllable) and distractor-syllabic rate (Distractor). The neural response at the target-word rate was significantly greater in correct trials than in incorrect trials (E; paired t-test: *, p < 0.05; FDR-corrected), in right top 5 channels. Error bars indicate SEM. Each dot indicates the neural response in each participant. C/F, The correlation within responses difference at the target-word rate (T-word), at the target-syllabic rate (T-syllable) and at the distractor-syllabic rate (Distractor). Each dot indicates the response difference between correct and incorrect trials in each participant. This response difference from the behavioural performance was significantly correlated between neural responses at the target-word rate and at the distractor-syllabic rate (F; Spearman’s correlation; rho = -0.53; p = 0.03), in right top 5 channels. Dashed lines indicate 95% confidence intervals around the linear fit line.


History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC