figshare
Browse
ABnet_Poster.pdf (1.25 MB)

Phonetics Embedding Learning with Side Information

Download (0 kB)
dataset
posted on 2014-12-23, 18:58 authored by Gabriel SynnaeveGabriel Synnaeve, Thomas Schatz, Emmanuel Dupoux

We show that it is possible to learn an efficient acoustic model using only a
small amount of easily available word-level similarity annotations. In contrast
to the detailed phonetic labeling required by classical speech recognition
technologies, the only information our method requires are pairs of
speech excerpts which are known to be similar (same word) and pairs of
speech excerpts which are known to be different (different words). An acoustic model is obtained by training shallow and deep neural networks, using an
architecture and a cost function well-adapted to the nature of the provided information. The resulting model is evaluated on an ABX minimal-pair discrimination task and is shown to perform much better (11.8% ABX error
rate) than raw speech features (19.6%), not far from a fully supervised baseline (best neural network: 9.2%, HMM-GMM: 11%).

 

History

Usage metrics

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC