posted on 2021-08-19, 20:41authored byPouriya Amini Digehsara, Christoph Wagner, Petr Schaffer, Michael Bärhold, Simon Stone, Dirk Plettemeier, Peter Birkholz
Silent speech recognition (SSR) is an active area
of research with applications ranging from speech restoration to speech
enhancement. Radar-based SSR has been proposed and investigated as a
non-invasive method to infer vocal tract states and articulatory movements from
measured changes in scattering parameters. One
of the challenges in developing a radar-based SSR system is to determine the
optimal set of features from these measurements. In this study, we therefore
investigated the following problems: (a) The selection of the features that
play the most significant role for classification. (b) The determination of the
contribution of each reflection and transmission spectrum and the most
important frequencies. (c) The determination of the performance of the
classifiers when using fewer features. (d) The determination of the robustness
of the classifiers against different noise levels. The data used in this study
consisted of 230 samples of 25 German phonemes (15 vowels, each in 10 contexts,
and 10 consonants, each in 8 contexts) produced by two German native speakers.
Using the full feature set, a Linear Discriminant Analysis (LDA) classifier
achieved up to 94 % classification accuracy for speaker 1 and 84 % for speaker
2. Using only the most important features as identified
by a decision tree, the classification accuracy deteriorated slightly in most
conditions, but in one case improved the accuracy from 73.5 % to 81 %.
Regarding the robustness against noise, the accuracy of the LDA dropped sharply
with increasing noise levels, while the decrease of the SVM’s accuracy was less
steep.