On the Importance of Different Cough Phases for COVID-19 Detection

Cough is an important symptom of numerous respiratory diseases, including COVID-19. While different cough phases (i.e., inhalation, compression, and expulsion) have been shown to be related to different pathological origins, existing cough-based COVID-19 detection systems rely on the entire cough recording, thus such phase-related characteristics are overlooked. In this study, our aim is two-fold. First, we have annotated over 1,250 cough recordings from two publicly-available cough sound databases, thus providing the research community with fine-grained cough phase labels. Next, we extract a number of temporal and acoustic features from each cough phase and test their usefulness and complementarity for COVID-19 detection. Experiments show the importance of cough phase segmentation, not only for improved COVID-19 detection, but also for the development of models that are interpretable and can better generalize across datasets.


INTRODUCTION
Cough is an important symptom of over 100 diseases [1] and has been shown as a major symptom of the recent coronavirus disease 2019 (COVID-19) [2].Pathological abnormalities in the respiratory system (e.g., phlegm, inflammation in lung bifurcations) can be reflected by the changes in the characteristics of coughs [3].A cough sound can be generally divided into three phases: (1) inhalation (inspiration), where the glottis remains wide open to bring air into the lung area; (2) compression, where a forced expiratory effort is against the closed glottis; and (3) expulsion, where the glottis is open again in the moment of the transient explosive sound is generated [4].Clinical studies have shown that cough phase patterns vary across respiratory diseases [5] and abnormalities in different cough phases can be linked to different pathological origins [1].For example, the length of the compression phase can be indicative of the location of secretions in lung airways [5]; the first explosive sound reflects the condition of tracheal bifurcations [1].Given that coughs can provide rich detail about the respiratory system and can be easily recorded via a microphone, cough-based COVID-19 prediction has emerged as a promising diagnostic tool.
In fact, several speech and cough-based detection challenges have been held since the outburst of COVID-19, including the INTERSPEECH 2021 Computational Paralinguistics Challenge (ComParE) [6] and the Diagnosis of COVID-19 using Acoustics (DiCOVA) Challenge series [7], [8].The majority of existing studies relied on the melspectrogram, mel-frequency cepstral coefficients (MFCC), and/or other classical features computed by the openSMILE toolbox [9], [10].Features, in turn, were computed across the entire cough recording.For classifiers deep, convolutional, and recurrent neural networks have shown high accuracy on both the ComParE and DiCOVA datasets [11]- [13].Notwithstanding, it has been shown that such models can suffer from overfitting and provide limited generalizability across datasets [11], [14].In fact, classical machine learning algorithms, such as a support vector machine (SVM), have shown to outperform deep neural network ones [6].As can be seen, there is still ample room for improvement.
In this paper, we propose to segment cough recordings into three different phases (i.e., inhalation, compression, and expulsion) and to extract features from each phase separately to explore their usefulness for COVID-19 detection.Moreover, existing systems have relied on features that have been optimized for speech applications, such as the MFCCs.However, clinical assessment of coughs typically emphasize other temporal properties, such as intensity, severity, and frequency of coughs [15].As such, we propose a new set of temporal features and test their complementarity to classical acoustic ones.When tested on a dataset of 1,259 cough recordings, experimental results show that segmenting coughs into different phases not only improves COVID-19 detection accuracy, but also fosters greater interpretability and generalizability of the models.The cough annotation files along with the innovated cough features are made available at https://github.com/MuSAELab/COVID_Cough_Phases.

Cough sound datasets
Two cough sound datasets are employed in this study, namely the ComParE [6] and DiCOVA2 [8] datasets.In both datasets, participants were asked to make forced coughs for several times in a quiet environment.The cough sounds were collected remotely via smartphones or computers.Cough recordings were re-sampled to 16kHz in ComParE and 44.1kHz in DiCOVA2.Other details about the cough sound collection procedure can be found in [6] and [8], respectively.We manually checked the quality of each audio file and found that 30 COVID-positive recordings in ComParE were with lower sampling rates (< 12kHz).Including these files could lead to overly-optimistic COVID-19 prediction results, thus they were removed from our analyses as suggested by [6].

Annotation procedure
Cough recordings from the two datasets were aggregated and randomly assigned to three annotators without class labels or meta-data information.Two annotators labelled half of the samples while the third annotator labelled all samples.Inconsistent annotations were discussed among annotators at the end for a final decision.Since recordings were collected in uncontrolled environments, cough signals could be mixed with other unwanted artifacts, such as background noise.Meanwhile, it was observed that other articulatory sounds could appear along with coughs, such as the sound of a gag reflex or a throat clearing.Hence, annotators were asked to assign one of the following six labels to each audio event: (1) inhalation, (2) compression, (3) expulsion (cough), (4) noise, (5) silence, and ( 6) other (which includes all types of articulatory sounds other than cough sounds).
A detailed annotation procedure can be summarized as follows.First, all audio files were imported to the PRAAT software [16] for visualization of the corresponding waveform and spectrogram.Annotators started by identifying the onset of the cough (expulsion) phases since the start of a bursting sound can be more easily located visually.As the majority of the recordings were collected indoors, such explosive sounds could lead to reverberant tails, which made it challenging to accurately find the offset of the cough phases.As such, the offset was determined by empirically once the amplitude decreased to 5% of the maximal expulsion phase amplitude and the reverberation tail was labelled as noise.
The inhalation phases, in turn, were more difficult to locate due to their low amplitude, thus could often be fully masked by background noise.Hence, only the audible inhalation phases were labelled.Once cough and inhalation phases were determined, the segment between the end of the inhalation and the start of the cough was labelled as a compression phase.Articulatory sounds were annotated using a similar procedure.Other unwanted segments were labelled as either noise or silence depending on their loudness.An example of a cough recording annotation can be found in Fig. 1.

Statistics and noise levels
An overview of the statistics of the annotations is shown in Table 1.The ComParE dataset was fully annotated with 695 samples (after exclusion of files with low sampling rate, as detailed in Section 2.1).The DiCOVA2 dataset was partially annotated, including 564 samples.In total, 1,259 cough recordings were annotated resulting in a total of 2,421 inhalation phases, 6,170 expulsion phases (coughs), 2,421 compression phases, and 135 articulatory sounds.
As mentioned previously, these datasets were collected "in the wild" thus have varying levels of background noise.To better characterize the data, we computed an approximate cough-to-noise ratio (CNR) as an indicator of the noise level (see Table 1

for details). CNR is computed as follows:
CN R dB = 10 log 10 P cough − 10 log 10 P noise , where P cough and P noise represent the average power of the segmented cough and noise segments, respectively, for each file.As can be seen from Table 1, for the 564 recordings from DiCOVA2, the CNR values of COVID-positive recordings were found to be substantially lower that of COVIDnegative recordings.Moreover, among the COVID-positive recordings from DiCOVA2, 22.4% were found to have CNR values around 0 dB.In these cases, the inhalation and compression phases were often indistinguishable from noise.This resulted in a much lower number of compression and inhalation phase labels for the DiCOVA2 set relative to ComParE.
A closer investigation showed that common noise sources included microphone glitches, as well as magnetic interference from other nearby devices.Removal of such noise sources is difficult after the recording, thus care needs to be taken during recording for future data collection efforts.

Cough processing pipeline
Extracted features can be divided into two categories: (1) traditional acoustic features (over 6,000 features) computed using the OPENSMILE toolkit [10] and (2) hand-crafted cough temporal features (henceforth referred to as "temporal features").The acoustic features are computed across three dif-  the average and standard deviation of the duration of cough, inhalation, and compression phases; the ratio of inhalation duration and compression duration to cough duration.A description of these features can be found in Table 2.These temporal features are engineered based on clinical cough measures and pathological insights.For example, the duration of compression phases has been linked to the presence and location of secretion in the airways [5].Inhalation and cough features, in turn, can be indicative of inflammation in different respiratory components [17], [18].Lastly, combinations of these feature sets are explored to examine their complementarity; the five different combinations tested are depicted by Fig. 2. To test the impact of the different feature sets, and not necessarily of the classifier, each feature set is trained with both the SVM with a linear kernel and the random forest (RF) classifier.The best performance achieved and corresponding classifier will be reported herein.

System evaluation
Systems are evaluated under two types of tasks: Task 1: Within-dataset.In this scenario, systems are trained and tested within the same datasets.Since a pre-defined trainvalidation-test split is provided by ComParE, we use the same  [14], principle component analysis (PCA) is used to map the combined high-dimensional feature set to a total of 300 features.This is performed whenever the acoustic feature set is present (i.e., all conditions shown in Fig. 2 except condition 4).For both tasks, the average area under the receiver operator characteristic curve (AUC-ROC) is used as the evaluation metric along with the 95% confidence intervals (CI) as a measure of performance variability.

Interpretation of cough temporal features
To evaluate the group-level difference in temporal features between COVID-positive and COVID-negative coughs, a Welch's t-test [19] is performed to examine the statistical significance.Among the 11 temporal features, eight were found to be significantly different (p-value < 0.05), of which four were with highly significant difference (p-value < 0.01).
Of these, the frequency of cough/expulsion and inhalation phases were found to be significantly lower for COVIDpositive coughs, suggesting that COVID-19 patients made fewer (forced) coughs than healthy individuals during the same time duration.Studies have shown that voluntary coughs rely more on the larynx [1], [5] compared to involuntary coughs, and could be associated with airway clearance ability [20].Considering that subjects asked to make forced coughs in both datasets, such finding indicates that COVID-19 patients might have decreased control in lower airways, which could be possibly caused by inflammation and decreased neuromotor control of larynx muscles [21].
In turn, inhalation and compression phases of COVIDpositive coughs were shown to be longer and with higher variations.These two phases are at the preparation stage of a cough and have been reported to be longer when secretions are present in the smaller airways [22], [23].Similar patterns have also been found in chronic bronchitis and emphysema [5], of which mucus and shortness of breath are two major symptoms.Interestingly, no significant difference was found in cough duration, which has been shown to increase by 50-100% when inflammation exists in the vocal chord [1].Notwithstanding, the obtained findings suggest that the proposed temporal feature set can be a good candidate to capture interpretable and pathological origins of COVID-19 coughs.

System performance
Next, we report the performance achieved by different feature set combinations (Table 3).As can be seen, for Task 1 (within-dataset), the top-performer feature set on ComParE is the fusion of cough and inhalation acoustic features with temporal features (condition 9), achieving an average AUC-ROC of 0.679.For DiCOVA2, in turn, the fusion of baseline acoustic features (computed across the entire recording) and temporal features (condition 5) achieved the highest AUC-ROC of 0.710, with the temporal features only providing a slight increase.The results also show that the importance of cough phases for COVID-19 detection varies across datasets, suggesting poor generalizability.For example, the inhalation features outperformed all other single feature sets on ComParE, while only achieving chance-level performance on DiCOVA2.A possible explanation is that COVID-positive recordings in DiCOVA2 are noisier (see Table 1), thus leading to degraded inhalation phase segments.Moreover, while the proposed temporal features alone do not achieve state-of-the-art per-formance, their fusion to acoustic ones consistently improved accuracy suggesting their complementarity.
As highlighted in [14], existing COVID-19 detection systems based on speech have poor generalizability across datasets.Results for Task 2 suggest this is also the case for coughs where cross-dataset AUC-ROC values were substantially lower than those achieved with Task 1.For sub-task 2.1, the fusion of cough acoustic and temporal features (condition 6) showed the highest accuracy with an average AUC-ROC of 0.622.For sub-task 2.2, on the other hand, the 11-dimensional proposed temporal feature set alone achieved the best overall accuracy.The feature sets based on inhalation phase features showed the greatest impact dropping to levels below chance.This is likely due to the effect that environmental noise can have on such low-amplitude segments.

CONCLUSIONS
Cough phase-related characteristics have been overlooked in COVID-19 diagnostics.In this study, we released a cough phase annotation dataset based on 1,259 cough recordings.Based on these fine-grained annotations, new cough temporal features are proposed and fused with conventional acoustic features computed separately for different phases.Withinand cross-dataset experiments have shown the importance of the different cough phases for COVID-19 detection, the complementarity of the cough temporal features to acoustic ones, as well as improved generalizability and interpretability.

ACKNOWLEDGEMENT AND DISCLAIMER
The authors thank the organizers of DiCOVA2 and ComParE challenges for collecting and sharing the data.The aforementioned organizations do not bear any responsibility for the analysis and results presented in this paper.All results and interpretation only represent the view of the authors.We also thank Zack and Hong for the annotation work and INRS, NSERC, and CIHR for the funding.

Fig. 1 .
Fig. 1.Example cough annotation excerpt.Blue vertical lines represent the onset and offset of each phase.

Table 1 .
Statistics of cough annotations.Pos: COVIDpositive.Neg: COVID-negative.Inh: Inhalation.Com: Compression.Cou: Cough.Art: Other articulatory sound.CNR (ave ± std): Cough-to-noise ratio in dB [6]ent segments:(1)the entire cough recording, as in the ComParE baseline system[6]; (2) only during cough phases; and (3) only during inhalation phases.The temporal features, in turn, capture cough properties not available within openS-MILE.A total of 11 features are computed, including: the frequency of coughs, inhalations, and other articulatory sounds;

Table 2 .
Description of cough temporal features.Dur cou ave ,Dur cou std Ave and std of cough duration Dur inh ave ,Dur inh std Ave and std of inhalation duration Dur com ave ,Dur com std Ave and std of compression duration Ratio inh&cou Ratio of Dur inh ave to Dur cou ave Ratio com&cough Ratio of Dur com ave to Dur cou ave data partition and conduct a 1000× bootstrap on the test set.With DiCOVA2, we employ 10 different speaker-independent data partitions (67%/33% train/test) and perform 5-fold crossvalidation on the training set for hyper-parameter tuning.Task 2: Cross-dataset.Here, systems are trained and tested on two different datasets.Two tests are conducted: (T2.1) models are trained on ComParE and tested on DiCOVA2 and (T2.2) vice versa.Motivated by the findings in

Table 3 .
Performance comparison for different feature set combinations.Average AUC-ROC scores are reported with 95% CIs.Bold values indicate best performance for a given task.Dim: Dimension.