eduzhai > Applied Sciences > Engineering >

Context-aware Goodness of Pronunciation for Computer-Assisted Pronunciation Training

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Mispronunciation detection is an essential component of the Computer-AssistedPronunciation Training (CAPT) systems. State-of-the-art mispronunciationdetection models use Deep Neural Networks (DNN) for acoustic modeling, and aGoodness of Pronunciation (GOP) based algorithm for pronunciation scoring.However, GOP based scoring models have two major limitations: i.e., (i) Theydepend on forced alignment which splits the speech into phonetic segments andindependently use them for scoring, which neglects the transitions betweenphonemes within the segment;(ii) They only focus on phonetic segments, which fails to consider thecontext effects across phonemes (such as liaison, omission, incomplete plosivesound, etc.).In this work, we propose the Context-aware Goodness of Pronunciation (CaGOP)scoring model. Particularly, two factors namely the transition factor and theduration factor are injected into CaGOP scoring.The transition factor identifies the transitions between phonemes and appliesthem to weight the frame-wise GOP. Moreover, a self-attention based phoneticduration modeling is proposed to introduce the duration factor into the scoringmodel.The proposed scoring model significantly outperforms baselines, achieving 20 and 12 relative improvement over the GOP model on the phoneme-level andsentence-level mispronunciation detection respectively.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...