Interspeech 2016 - Experiment results for Sheffield Wargame Corpora (SWC1, SWC2, SWC3)

2016-06-15T08:59:52Z (GMT) by Yulan Liu Thomas Hain Madina Hasan
<p>The files in the dataset correspond to results that have been generated for Interspeech 2016 paper: "<b>The Sheffield Wargames Corpus - Day Two and Day Three</b>" (<a href="https://doi.org/10.21437/Interspeech.2016-98">DOI: 10.21437/Interspeech.2016-98</a>). This paper details a natural English speech corpora recorded in natural environment with multi-media and multi-microphones, reports baseline speech recognition performance based on standalone training and adaptation, and it also releases a Kaldi recipe for standalone training.</p> <p><br></p> <p>The files in the zip file are of three types:</p> <p>- .ctm, which correspond to the output of the automatic speech recognition system and the columns include segment information as well as transcripts of the recognition.</p> <p>- .ctm.filt.sys, which correspond to scoring of the automatic speech recognition system and includes the overall word error rate as well as the number of insertions, deletions and substitutions of the overall system.</p> <p>- .ctm.filt.lur, which provides a more detailed decomposition of the word error rate across multiple genres.</p> <p><br></p> <p>The three file types are repeated for all the results described in Table 4 and Table 5 of the paper.</p> <p><br></p> <p>The following is a description about the naming convention of the files (already explained in the paper):</p> <p><br></p> <p>"ihm" refers to "Individual Headset Microphone".</p> <p>"sdm" refers to "Single Distant Microphone".</p> <p>"mdm8" refers to "Multiple Distant Microphone - 8 channels".</p> <p>"LDA" refers to "Linear Discriminant Analysis".</p> <p>"MLLT" refers to "Maximum Likelihood Linear Transform".</p> <p>"SAT" refers to "Speaker Adaptive Training".</p> <p>"MMI" refers to "Maximum Mutual Information".</p> <p>"DNN" refers to "Deep Neural Network".</p> <p>"sMBR" refers to "state-level Minimum Bayes Risk".</p> <p>"fMLLR" refers to "feature-level Maximum Likelihood Linear Regression".</p> <p>"o4" refers to "maximally 4 overlapping speakers in scoring".</p> <p><br></p> <p><br></p> <p>All three file types are standard outputs that are recognized by the automatic speech recognition community and can be opened using any text editor.</p>