eduzhai > Applied Sciences > Engineering >

Data augmentation using prosody and false starts to recognize non-native childrens speech

  • king
  • (0) Download
  • 20210507
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: This paper describes AaltoASR s speech recognition system for the INTERSPEECH2020 shared task on Automatic Speech Recognition (ASR) for non-nativechildren s speech. The task is to recognize non-native speech from children ofvarious age groups given a limited amount of speech. Moreover, the speech beingspontaneous has false starts transcribed as partial words, which in the testtranscriptions leads to unseen partial words. To cope with these twochallenges, we investigate a data augmentation-based approach. Firstly, weapply the prosody-based data augmentation to supplement the audio data.Secondly, we simulate false starts by introducing partial-word noise in thelanguage modeling corpora creating new words. Acoustic models trained onprosody-based augmented data outperform the models using the baseline recipe orthe SpecAugment-based augmentation. The partial-word noise also helps toimprove the baseline language model. Our ASR system, a combination of theseschemes, is placed third in the evaluation period and achieves the word errorrate of 18.71 . Post-evaluation period, we observe that increasing the amountsof prosody-based augmented data leads to better performance. Furthermore,removing low-confidence-score words from hypotheses can lead to further gains.These two improvements lower the ASR error rate to 17.99 .

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...