10.15131/shef.data.3581427.v1
Salil Deena
Salil
Deena
Madina Hasan
Madina
Hasan
Mortaza Doulaty Bashkand
Mortaza
Doulaty Bashkand
Oscar Saz Torralba
Oscar
Saz Torralba
Thomas Hain
Thomas
Hain
Experiments results for IEEE/ACM Transaction on Audio, Speech and Language Processing Journal Paper: "Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment"
The University of Sheffield
2018
RNNLM
Language Model Adaptation
Automatic Speech Recognition
Lightly Supervised Alignment
Multi-Genre Broadcast Challege
Natural Language Processing
Artificial Intelligence and Image Processing not elsewhere classified
2018-12-20 16:17:36
Dataset
https://orda.shef.ac.uk/articles/dataset/Experiments_results_for_IEEE_ACM_Transaction_on_Audio_Speech_and_Language_Processing_Journal_Paper_Recurrent_Neural_Network_Language_Model_Adaptation_for_Multi-Genre_Broadcast_Speech_Recognition_and_Alignment_/3581427
The files in the dataset correspond to results that have been generated for the IEEE/ACM Transactions on Audio, Speech and Language Processing paper: "Recurrent Neural Network Language Model Adaptation for Multi-Genre Broadcast Speech Recognition and Alignment", DOI: 10.1109/TASLP.2018.2888814. The paper deals with language model adaptation for the MGB Challenge 2015 transcription and alignment tasks.<br>
<br>
The files in the zip file are of three types:<br>
- .ctm, which correspond to the output of the automatic speech recognition system and the columns include segment information as well as transcripts of the recognition.<br>
- .ctm.filt.sys, which correspond to scoring of the automatic speech recognition system and includes the overall word error rate as well as the number of insertions, deletions and substitutions of the overall system.<br>
- .ctm.filt.lur, which provides a more detailed decomposition of the word error rate across multiple genres.<br>
<br>
The three file types are repeated for all the results described in Tables 4,5 and 6 of the paper (27 entries in total).<br>
<br>
The following is a description about the naming convention of the files:<br>
<br>
4gram.amlm.baseline refers to the 4-gram LM baseline on LM1 and LM2 text<br>
rnnlm refers to Recurrent Neural Network Language Model.<br>
amrnnlm prefix refers to acoustic model text RNNLM.<br>
amlmrnnlm prefix refers to acoustic model + language model text RNNLM.<br>
.baseline.lattice.rescore suffix refers to baseline results generated with lattice rescoring.<br>
.nbest.baseline.rescore suffix refers to baseline results generated with nbest rescoring.<br>
.noadaptation refers to RNNLM results with no adaptation.<br>
.genre.finetune refers to genre fine-tuning of the RNNLMs.<br>
.genre.adaptationlayer refers to genre LHN adaptation layer fine-tuning of the RNNLMs.<br>
.ldafeat.hiddenlayer refers to text-based Latent Dirichlet Allocation (LDA) features at the hidden layer.<br>
.acousticldafeat.hiddenlayer refers to acoustic LDA features at the hidden layer<br>
.acoustictextldafeat.hiddenlayer refers to acoustic and text LDA features at the hidden layer.<br>
.genrefeat.hiddenlayer refers to Genre 1-hot auxiliary codes at the hidden layer.<br>
.genrefeat.adaptationlayer refers to Genre 1-hot auxiliary codes at the adaptation layer.<br>
.2layer.ldafeat.hiddenlayer refers to a 2-layer RNNLM with text LDA features at the hidden layer and no feat. at adaptation layer.<br>
.2layer.ldafeat.hiddenlayer.genrefinetune refers to a 2-layer RNNLM with text LDA features at the hidden layer, no feat. at adaptation layer and genre fine-tuning.<br>
.kcomponent refers to K-Component Adaptive Topic fine-tuning using LDA posteriors<br>
<br>
All three file types are standard outputs that are recognised by the automatic speech recognition community and can be opened using any text editor.