用于年龄自适应识别的多模态语音数据集:老年和年轻用户的语音样本
This dataset was created to support a study aimed at improving Automatic Speech Recognition (ASR) performance for elderly users through multimodal acoustic feature analysis and deep learning techniques. It contains balanced and standardized speech samples from both elderly and young speakers. Each sample has been preprocessed and annotated, with key acoustic features extracted, including pitch, speech rate, clarity, pauses, and loudness.
The dataset was used to train and evaluate a Convolutional Neural Network (CNN)-based speech classification model. Experimental results showed that the model using multimodal feature inputs significantly outperformed the model using only single features such as MFCC, in terms of classification accuracy, precision, recall, and F1 score.
This dataset provides important data support for designing elderly-friendly speech interaction systems and serves as a valuable resource for research on speech differences across age groups.