Binaural summation of amplitude modulation involves weak interaural suppression

The brain combines sounds from the two ears, but what is the algorithm used to achieve this summation of signals? Here we combine psychophysical amplitude modulation discrimination and steady-state electroencephalography (EEG) data to investigate the architecture of binaural combination for amplitude-modulated tones. Discrimination thresholds followed a ‘dipper’ shaped function of pedestal modulation depth, and were consistently lower for binaural than monaural presentation of modulated tones. The EEG responses were greater for binaural than monaural presentation of modulated tones, and when a masker was presented to one ear, it produced only weak suppression of the response to a signal presented to the other ear. Both data sets were well-fit by a computational model originally derived for visual signal combination, but with suppression between the two channels (ears) being much weaker than in binocular vision. We suggest that the distinct ecological constraints on vision and hearing can explain this difference, if it is assumed that the brain avoids over-representing sensory signals originating from a single object. These findings position our understanding of binaural summation in a broader context of work on sensory signal combination in the brain, and delineate the similarities and differences between vision and hearing.


Introduction
observations, detailed investigation and modelling of the binaural processing of amplitude-84 modulated tones is lacking. 85 86 Computational predictions for both psychophysical and electrophysiological results can be 87 obtained from parallel work that considers the combination of visual signals across the left 88 and right eyes. In several previous studies, a single model of binocular combination has 89 been shown to successfully account for the pattern of results from psychophysical contrast 90 discrimination and matching tasks [33,34], as well as steady-state EEG experiments [35]. The 91 model, shown schematically in Figure 1a, takes contrast signals (sinusoidal modulations of 92 luminance) from the left and right eyes, which mutually inhibit each other before being 93 summed as follows: 94 95 where resp is the model response, CL and CR are the contrast signals in the left and right 98 eyes respectively, w is the weight of interocular suppression, Z is a constant governing the 99 gain of the model, and p and q are exponents with the typical constraint that p>q. In all 100 experiments in which the two signals have the same visual properties [33][34][35], the weight of 101 interocular suppression has a value around w = 1 (and is assumed to be effectively 102 instantaneous, though in reality is likely subject to some delay). 103 104 Whereas vision studies typically modulate luminance relative to a mean background level 105 (i.e. contrast), in hearing studies the amplitude modulation of a carrier waveform can be 106 used to achieve the same effect. We can therefore test empirically whether binaural signal 107 combination is governed by the same basic algorithm (in the tradition associated with David 108 Marr [36]) as binocular signal combination by replacing the C terms in equation 1 with 109 modulation depths for AM stimuli. 110 111 The response of the model for different combinations of inputs is shown in Figure 1b, with 112 predictions being invariant to the sensory modality (hearing or vision). In the 113 monaural/monocular ("mon") condition (blue), signals are presented to one channel only. In 114 the binaural/binocular ("bin") condition (red) equal signals are presented to the two 115 channels. In the dichotic/dichoptic ("dich") condition (green) a signal is presented to one 116 channel, with a fixed high amplitude 'masker' presented to the other channel throughout. 117 For w = 1, the mon and bin conditions produce similar outputs, despite a doubling of the 118 input (two channels vs one). This occurs because the strong suppression between channels 119 offsets the gain in the input signal. This pattern of responses is consistent with the 120 amplitudes recorded from steady-state visual evoked potential experiments testing 121 binocular combination in humans [35].

129
The model response can also be used to predict the results of psychophysical increment 130 detection experiments in which thresholds are measured for discriminating changes in the 131 level of a 'pedestal' stimulus (e.g. a stimulus of fixed intensity). In these experiments, 132 thresholds are defined as the horizontal translation required to produce a unit increase 133 vertically along the functions in Figure 1b. In other words, psychophysical performance 134 measures the gradient of the contrast response function. These predictions are shown in 135 Figure 1c and have a characteristic 'dipper' shape, in which thresholds first decrease 136 (facilitation), before increasing (masking). The mon and bin functions converge at higher 137 pedestal levels, and the dich function shows strong threshold elevation owing to the 138 suppression between the two channels (when w = 1 w(t) = 0.5*(1 + m*cos(fm*t*2p + p)) * sin(fc*t*2p) (2) 161 162 where fm is the modulation frequency in Hz, fc is the carrier frequency in Hz, t is time in 163 seconds, and m is the modulation depth, with a value from 0 -1 (though hereafter  164 expressed as a percentage, 100*m). We chose not to compensate for overall stimulus 165 power (as is often done for AM stimuli, e.g. [37]) for 3 reasons (see also [29]). First, such 166 compensation mostly affects AM detection thresholds at much higher modulation 167 frequencies than we used here [e.g. see Figure A1 of 38]. Second, compensation makes 168 implicit assumptions about the cues used by the participant in the experiment, and we 169 prefer to make any such cues explicit through computational modelling. Third, we 170 confirmed in a control experiment that compensation had no systematic effect on 171 thresholds (see Supplementary Figure S1). In the psychophysical discrimination experiment, participants heard two amplitude-184 modulated stimuli presented sequentially using a two-alternative-forced-choice (2AFC) 185 design. The stimulus duration was 500 ms, with a 400 ms interstimulus interval (ISI) and a 186 minimum inter-trial interval of 500 ms. One interval contained the standard stimulus, 187 consisting of the pedestal modulation depth only. The other interval contained the signal 188 stimulus, which comprised the pedestal modulation depth with an additional target 189 increment. 190 191 The presentation order of the standard and signal intervals was randomised, and 192 participants were instructed to indicate the interval which they believed contained the 193 target increment using a two-button mouse. A coloured square displayed on the computer 194 screen indicated accuracy (green for correct, red for incorrect). The size of the target 195 increment was determined by a pair of 3-down-1-up staircases, with a step size of 3 dB 196 (where dB units are defined as 20*log10(100*m)), which terminated after the lesser of 70 197 trials or 12 reversals. The percentage of correct trials at each target modulation depth was 198 used to fit a cumulative log-Gaussian psychometric function (using Probit analysis) to the 199 data pooled across repetitions. We used this fit to estimate the target modulation that 200 yielded a performance level of 75% correct, which was defined as the threshold. Each 201 participant completed three repetitions of the experiment, producing an average of 223 202 trials per condition (and an average of 7133 trials in total per participant). This took around 203 5 hours in total per participant, and was completed across multiple days in blocks lasting 204 around 10 minutes each. 205 206 Four binaural arrangements of target and pedestal were tested, at 8 pedestal modulation 207 depths (100*m = 0, 1, 2, 4, 8, 16, 32 & 64). The arrangements are illustrated schematically in 208 Figure 2a, and were interleaved within a block at a single pedestal level, so that on each trial 209 participants were not aware of the condition being tested. Note that in all conditions the 210 carrier was presented to both ears, whether or not it was modulated by the pedestal and/or 211 target. This avoids confounding the ears presented with the modulator with those 212 presented with the carrier. In the monaural condition, the pedestal and target modulations 213 were presented to one ear, with the other ear receiving only the unmodulated carrier. The 214 modulated stimulus was assigned randomly to an ear on each trial. In the binaural 215 condition, the pedestal and target modulations were presented to both ears (in phase). 216 Comparison of the binaural and monaural conditions reveals the advantage of stimulating 217 both ears with an AM stimulus, rather than only one. In the dichotic condition, the pedestal 218 modulation was presented to one ear and the target modulation to the other ear. This 219 allows us the measurement of masking effects across the ears. Finally, in the half-binaural 220 condition, the pedestal modulation was played to both ears, but the target modulation to 221 only one ear. When compared with the binaural condition, this arrangement keeps the 222 number of ears receiving the pedestal fixed, and changes only the number of ears receiving 223 the target modulation. It therefore does not confound the effects of pedestal and target 224 stimulation across the ears, and offers a more appropriate comparison than does the 225 monaural condition. Note that for pedestal modulation depths of m = 0, and in the dichotic 226 condition, the target increment was relative to the unmodulated carrier. Because the m = 0 227 detection condition was identical across the monaural, dichotic and half-binaural conditions, 228 we pooled data across these conditions to obtain a more reliable estimate of threshold. In 229 all conditions, the modulation frequency for the pedestal and the target was 40 Hz. 230 231 EEG procedure 232 233 In the EEG experiment, participants heard 11-s sequences of amplitude-modulated stimuli 234 interspersed with silent periods of 3 seconds. There were five signal modulation depths (m = 235 6.25, 12.5, 25, 50 & 100%) and six binaural conditions, as illustrated in Figure 2b. In the first 236 three conditions, a single modulation frequency (40 Hz, F1) was used. In the monaural 237 condition, the modulated 'signal' tone was presented to one ear, and the unmodulated 238 carrier was presented to the other ear. In the binaural condition, the signal modulation was 239 presented to both ears. In the dichotic condition, the signal modulation was presented to 240 one ear, and a modulated masker with a modulation depth of m = 50% was presented to 241 the other ear. These three conditions permit estimation of summation and gain control 242 properties, as the use of the same modulation frequency in both ears means that signals to 243 the left and right ears will sum. 244 245 246 Figure

256
The remaining three conditions involved modulation at a second modulation frequency (35 257 Hz, F2), in order to isolate suppressive processes between channels. In the cross-monaural 258 condition, F2 was presented to one ear as the signal, and the unmodulated carrier was 259 presented to the other ear (F1 was not presented to either ear). This provides a comparison 260 with the 40 Hz monaural condition, and also a baseline with which to compare the other 261 cross-frequency conditions. In the cross-binaural condition, F1 was presented to one ear 262 and F2 was presented to the other ear but the modulation depth of F1 and F2 was the 263 same. This allows measurement of suppressive interactions between the ears without the 264 complicating factor of signal summation at the same modulator frequency tag. In the cross-265 dichotic condition, F1 was presented to one ear, and F2 (m = 50%) was presented to the 266 other ear. Again, we expect this condition to reveal suppressive interactions between the 267 ears, as the F2 mask should suppress the F1 target, and reduce the amplitude of the 268 response measured at 40 Hz. 269 270 The order of conditions was randomised, and each condition was repeated ten times, 271 counterbalancing the presentation of stimuli to the left and right ears as required. Trials 272 were split across 5 blocks, each lasting 14 minutes, with rest breaks between blocks. EEG 273 data for each trial at each electrode were then analysed offline. Discrimination results are consistent with weak interaural suppression 300 301 The results of the AM depth discrimination experiment are shown in Figure 3  binaurally presented modulations (red squares in Figure 3a) followed a 'dipper' shape [39], 307 with thresholds decreasing from an average of around 6% at detection threshold to around 308 2% on a pedestal of 8% (a facilitation effect). At higher pedestal modulations, thresholds 309 increased to more than 16%, indicating a masking effect. Thresholds for the monaural 310 modulation (blue circles in Figure 3a) followed a similar pattern, but were shifted vertically 311 by an average factor of 1.90 across all pedestal levels. The monaural and binaural dipper 312 handles remained apart, and were approximately parallel, at higher pedestal modulation 313 depths. At detection threshold (pedestal m = 0), the average summation between binaural 314 and monaural modulation (e.g. the vertical offset between the leftmost points in Figure 3a Dichotic presentation (pedestal modulation in one ear and target modulation in the other) 326 elevated thresholds very slightly, by a factor of 1.19 at the highest pedestal modulation 327 depths (green diamonds in Figure 3a), compared to baseline (0% pedestal modulation). This 328 masking effect was substantially weaker than is typically observed for dichoptic pedestal 329 masking in vision (see Figure 1a), which can elevate thresholds by around a factor of 30 [34]. 330 The thresholds for the half-binaural condition (orange triangles in Figure 3a-d), where the 331 pedestal was presented to both ears, but the target only to one ear, was not appreciably 332 different from that for the monaural condition, with thresholds greater than in the binaural 333 condition by a factor of 1.94 on average. 334 335 These results can be converted to Weber fractions by dividing the threshold increments by 336 the pedestal modulation depths, for pedestals >0%. These values are shown for the average 337 data in Figure 3b. At lower pedestal modulation depths (<8%), Weber fractions decreased 338 with increasing pedestal level. At pedestal modulations above 8%, the binaural Weber 339 fractions (red squares) plateaued at around 0.25, whereas the monaural and half-binaural 340 Weber fractions (blue circles and orange triangles) plateaued around 0.5. The dichotic 341 Weber fractions (green diamonds) continued to decrease throughout. Thus, non-Weber 342 Pedestal modulation depth (%) behaviour occurred over the lower range of pedestal modulations depths, but more 343 traditional Weber-like behaviour was evident at higher pedestal levels. The exception is the 344 dichotic condition, where non-Weber behaviour was evident throughout. 345 346 Overall, this pattern of results is consistent with a weak level of interaural suppression 347 between the left and right ears. This accounts for the lack of convergence of monaural and 348 binaural dipper functions at high pedestal levels, and the relatively minimal threshold 349 elevation in the dichotic masking condition, as we will show in greater detail through 350 computational modelling below. Our second experiment sought to measure modulation 351 response functions directly using steady-state EEG to test whether this weak suppression is 352 also evident in cortical responses.  SNRs are plotted as a function of modulation depth in Figure 5. For a single modulation 381 frequency (40 Hz), responses increased monotonically with increasing modulation depth, 382 with SNRs >2 evident for modulation depths above 12.5%. Binaural presentation (red 383 squares in Figure 5a) led to SNRs of around 7 at the highest modulation depth, whereas 384 monaural modulation produced weaker signals of SNR~5 (blue circles in Figure 5a). 385 Assuming a baseline of SNR=1 in the absence of any signal (where activity at the signal 386 frequency will equal the noise level in the adjacent frequency bins), this represents a 387 binaural increase in response of a factor of 1.5. The finding that monaural and binaural 388 functions do not converge at high modulation depths suggests that interaural suppression is 389 too weak to fully normalise the response to two inputs compared with one.

396
In the dichotic condition (green diamonds in Figure 5a), a masker with a fixed 50% 397 modulation depth presented to one ear produced an SNR of 4 when the unmodulated 398 carrier was presented to the other ear (see left-most point). As the dichotic signal 399 modulation increased, responses increased to match the binaural condition at higher signal 400 modulations (red squares and green diamonds converge in Figure 5a). 401 402 When the carrier presented to one ear was modulated at a different frequency (35 Hz), 403 several differences were apparent for the three conditions. Monaural modulation at 35 Hz 404 (the cross-mon condition) evoked no measureable responses at 40 Hz as expected (orange 405 circles in Figure 5b). At the modulation frequency of 35 Hz, this condition produced a 406 monotonically increasing function peaking around SNR=3.5 (orange circles in Figure 5c). 407 Binaural modulation with different modulation frequencies in each ear led to weaker 408 responses (SNRs of 4 at 40 Hz and 3 at 35 Hz; purple triangles in Figure 5b,c) than for 409 binaural modulation at the same frequency (SNR=7, red squares in Figure 5a). A 35 Hz AM 410 masker with a fixed 50% modulation depth presented to one ear produced little change in 411 the response to a signal in the other ear, which was amplitude-modulated with a 412 modulation frequency of 40 Hz (grey inverted triangles in Figure 5b), though increasing the 413 signal modulation depth slightly reduced the neural response to the 35-Hz AM masker (grey 414 inverted triangles in Figure 5c). This weak dichotic masking effect is further evidence of 415 weak interaural suppression. We next consider model arrangements that are able to explain 416 these results. 417 418 A single model of signal combination predicts psychophysics and EEG results 419 420 To further understand our results, we fit the model described by equation 1 to both data 421 sets. To fit the psychophysical data, we calculated the target modulation depth that was 422 necessary to increase the model response by a fixed value, s40, which was a fifth free 423 parameter in the model (the other four free parameters being p, q, Z and w; note that all 424 parameters were constrained to be positive, q was constrained to always be greater than 2 425 to ensure that the nonlinearity was strong enough to produce a dip, and we ensured that 426 p>q). With five free parameters, the data were described extremely well (see Figure 6a However, the value of the interaural suppression parameter was much less than 1 (w = 0.02, 431 see Table 1). This weak interaural suppression changes the behaviour of the canonical 432 model shown in Figure 1c in two important ways, both of which are consistent with our 433 empirical results. First, the degree of threshold elevation in the dichotic condition is much 434 weaker, as is clear in the data (green diamonds in Figures 3a & 6a). Second, the thresholds 435 in the monaural condition are consistently higher than those in the binaural condition, even 436 at high pedestal levels (compare blue circles and red squares in Figures 3a & 6a). 437 438 To illustrate how the model behaves with stronger interaural suppression, we increased the 439 weight to a value of w = 1, but left the other parameters fixed at the values from the 440 previous fit. This manipulation (shown in Figure 6b) reversed the changes caused by the 441 weaker suppression -masking became stronger in the dichotic condition, and the monaural 442 and binaural dipper functions converged at the higher pedestal levels. These changes 443 provided a poorer description of human discrimination performance, with the RMSE 444 increasing from 1.2 dB to 5.5 dB. Finally, we held suppression constant (at w = 1), but 445 permitted the other four parameters to vary in the fit. This somewhat improved the fit (see 446 Figure 6c), but retained the qualitative shortcomings associated with strong interaural 447 suppression, and only slightly improved the RMSE (from 5.5 dB to 4.3 dB). 448 449 450 Table 1: Parameters for the model fits shown in Figure 6 with parameter constraints as described in the text.

462
To fit the EEG data, we converted the model response to an SNR by adding the noise 463 parameter (s) to the model response, and then scaling by the noise parameter (e.g. (resp + 464 s)/s). Because maximum SNRs varied slightly across the two modulation frequencies (40   465 and 35 Hz, see Figure 5), we permitted this noise parameter to take a different value at each 466 frequency (s40 and s35). Model predictions for the conditions described in Figure 2b  The model captures the increased response to binaural modulations compared with 471 monaural modulations (blue circles vs red squares in Figure 6d), the relatively modest 472 suppression in the cross-bin (purple triangles) and cross-dichotic (grey triangles) conditions 473 at 40 Hz relative to the monaural condition, and the gentle decline in SNR in the cross 474 dichotic condition at the masker frequency (black triangles in Figure 6d). Most parameters 475 took on comparable values to those for the dipper function fits described above (see Table  476 1). Of particular note, the weight of interaural suppression remained weak (w = 0.14). 477 478 We again explored the effect of increasing the weight of suppression (to w = 1) whilst 479 keeping the other parameters unchanged. This resulted in a reduction of amplitudes in the 480 binaural and cross-binaural conditions, which worsened the fit (to an RMSE of 0.96). 481 Permitting all other parameters (apart from w) to vary freely improved the fit (to RMSE = 482 0.51), but there were still numerous shortcomings. In particular the monaural and binaural 483 response functions were more similar than in the data, and the reduction in SNR in the 484 cross-binaural and cross-dichotic conditions was more extensive than found empirically. 485 486 Our modelling of the data from two experimental paradigms therefore support the 487 empirical finding that interaural suppression is relatively weak (by around an order of 488 magnitude) compared with analogous phenomena in vision (interocular suppression Suppression between the ears has been measured previously with steady-state 517 magnetoencephalography (MEG) using amplitude-modulated stimuli with frequencies that 518 are the same [42] or different [23,24] in the left and right ears. When the same frequency is 519 used in both ears, suppression can be assessed by comparing binaural responses to the 520 linear sum of two monaural responses. When the measured binaural response is weaker 521 than this prediction, this is taken as evidence of suppression between the ears (though we 522 note that nonlinear transduction might produce similar effects). Tiihonen et al.
[42] used 523 500 ms click trains at 40 Hz, and found evidence for strong suppression of the initial evoked 524 N100 amplitudes, but weaker suppression of the 40 Hz response (especially relative to 525 ipsilateral stimuli). If suppression decreased even further for longer presentations (as used  526 here), this might explain why suppression appears so weak in our study. Alternatively, the 527 Tiihonen study used laterally placed MEG sensors to record signals from auditory cortex, 528 whereas we used EEG with a central region of interest, which might also account for the 529 differences. Two other studies [23,24] used different frequency tags in the two ears in 530 conditions analogous to our cross-binaural and cross-dichaural conditions. For tag 531 frequencies around 20 Hz, there were varying amounts of suppression between 36% and 532 72% of the monaural response depending on whether signals were measured from the left 533 or right hemisphere, and whether they were for ipsilateral or contralateral presentations 534 [23]. A second study [24] used frequencies around 40 Hz, and again found a range of 535 suppression strengths depending on laterality and hemisphere. The weakest suppressive 536 effects were comparable to those measured here using steady-state EEG (see Figure 5b,c). It 537 is possible that different stages of processing might involve different amounts of 538 suppression, which would require the use of techniques with better spatial precision to 539 localise responses of interest to specific brain regions. 540 541 Another widely-studied phenomenon that might involve suppression between the ears is 542 the The model shares features with previous binaural models 555 556 Previous models of binaural processing [50][51][52] have some architectural similarities to the 557 model shown in Figure 1a. For example, binaural inhibition is a common feature [52], often 558 occurring across multiple timescales [50]. However these models are typically designed with 559 a focus on explaining perception across a range of frequencies (and for inputs of arbitrary 560 frequency content), rather than attempting to understand performance on specific tasks 561 (i.e. AM depth discrimination) or the precise mapping between stimulus and cortical 562 response (i.e. the amplitude response functions measured using steady-state EEG). At 563 threshold, one model [51] predicts minimal levels of binaural summation (~1dB) in line with 564 probabilistic combination of inputs but below that found experimentally. These models 565 would therefore likely require modification (i.e. the inclusion of physiological summation 566 and early nonlinearities) to explain the data here, though it is possible that such 567 modifications could be successful, given the other similarities between the models. 568 569 Several previous neural models of binaural processing have focussed on excitatory and 570 inhibitory processes of neurons in subcortical auditory structures such as the lateral 571 superior olive. These models (reviewed in [45]) are concerned with lateralised processing, in 572 which interaural interactions are purely inhibitory, and so do not typically feature excitatory 573 summation. However, models of inferior colliculus neurons do typically involve binaural 574 summation, and have the same basic structure as the architecture shown in Figure 1a. In 575 general these models are designed to explain responses to diotically asynchronous stimuli 576 (where stimuli reach the two ears at different times), and so typically feature asymmetric 577 delays across the excitatory and inhibitory inputs from the two ears [e.g. 53]. Since a time 578 delay is not a critical component of the divisive suppression on the denominator of equation 579 1, and because a mechanism with broad temporal tuning is equivalent to the envelope of 580 many mechanisms with different delays, the architecture proposed here can be considered 581 a generalised case of such models. 582 583 Ecological constraints on vision and hearing 584 585 This study reveals an important and striking difference between hearing and vision -586 suppression between the ears is far weaker than suppression between the eyes. Why 587 should this be so? In the visual domain, the brain attempts to construct a unitary percept of 588 the visual environment from two overlapping inputs, termed binocular single vision. For 589 weak signals (at detection threshold) it is beneficial to sum the two inputs to improve the 590 signal-to-noise ratio. But above threshold, there is no advantage for a visual object to 591 appear more intense when viewed with two eyes compared with one. The strong 592 interocular suppression prevents this from occurring by normalizing the signals from the left 593 and right eyes to achieve 'ocularity invariance' -the constancy of perception through one or 594 both eyes [33]. The guiding principle here may be that the brain is reducing redundancy in 595 the sensory representation by avoiding multiple representations of a single object. 596 597 In the human auditory system the ears are placed laterally, maximising the disparity 598 between the signals received (and minimising overlap). This incurs benefits when 599 determining the location of lateralised sound sources, though reporting the location of pure 600 tone sources at the midline (i.e. directly in front or behind) is very poor [2]. Hearing a sound 601 through both ears at once therefore does not necessarily provide information that it comes 602 from a single object, and so the principle of invariance should not be applied (and interaural 603 suppression should be weak). However other cues that are consistent with a single auditory 604 object (for example interaural time and level differences consistent with a common 605 location) should result in strong suppression to reduce redundant representations, and cues 606 that signals come from multiple auditory objects should release that suppression. This is the 607 essence of the BMLD effects discussed above -suppression is strongest when target and 608 masker have the same phase offsets (consistent with a common source), and weakest when 609 their phase offsets are different. The distinct constraints placed on the visual and auditory 610 systems therefore result in different requirements, which are implemented in a common 611 architecture by changing the weight of suppression between channels. 612 613

615
A combination of psychophysical and electrophysiological experiments, and computational 616 modelling have converged on an architecture for the binaural combination of amplitude-617 modulated tones. This architecture is identical to the way that visual signals are combined 618 across the eyes, with the exception that the weight of suppression between the ears is 619 weaker than that between the eyes. This is likely because the ecological constraints 620 governing suppression of multiple sources aim to avoid signals from a common source being 621 over-represented. Such a high level of consistency across sensory modalities is unusual, and 622 illustrates how the brain can adapt generic neural circuits to meet the demands of a specific 623 situation.