Distributed Learning of Multilingual DNN Feature Extractors using GPUs

Miao, Yajie; Zhang, Hao; Metze, Florian

doi:10.1184/R1/6473321.v1

file.pdf (129.98 kB)

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

journal contribution

posted on 2014-09-01, 00:00 authored by Yajie Miao, Hao Zhang, Florian MetzeFlorian Metze

Multilingual deep neural networks (DNNs) can act as deep feature extractors and have been applied successfully to crosslanguage acoustic modeling. Learning these feature extractors becomes an expensive task, because of the enlarged multilingual training data and the sequential nature of stochastic gradient descent (SGD). This paper investigates strategies to accelerate the learning process over multiple GPU cards. We propose the DistModel and DistLang frameworks which distribute feature extractor learning by models and languages respectively. The time-synchronous DistModel has the nice property of tolerating infrequent model averaging. With 3 GPUs, DistModel achieves 2.6× speed-up and causes no loss on word error rates. When using DistLang, we observe better acceleration but worse recognition performance. Further evaluations are conducted to scale DistModel to more languages and GPU cards.

History

Date

2014-09-01

Usage metrics

Keywords

Deep neural networks distributed learning automatic speech recognition

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Distributed Learning of Multilingual DNN Feature Extractors using GPUs

History

Date

Usage metrics

Categories

Keywords

Licence

Exports