eduzhai > Applied Sciences > Engineering >

Ultra2Speech -- A Deep Learning Framework for Formant Frequency Estimation and Tracking from Ultrasound Tongue Images

  • Save

... pages left unread,continue reading

Document pages: 10 pages

Abstract: Thousands of individuals need surgical removal of their larynx due tocritical diseases every year and therefore, require an alternative form ofcommunication to articulate speech sounds after the loss of their voice box.This work addresses the articulatory-to-acoustic mapping problem based onultrasound (US) tongue images for the development of a silent-speech interface(SSI) that can provide them with an assistance in their daily interactions. Ourapproach targets automatically extracting tongue movement information byselecting an optimal feature set from US images and mapping these features tothe acoustic space. We use a novel deep learning architecture to map US tongueimages from the US probe placed beneath a subject s chin to formants that wecall, Ultrasound2Formant (U2F) Net. It uses hybrid spatio-temporal 3Dconvolutions followed by feature shuffling, for the estimation and tracking ofvowel formants from US images. The formant values are then utilized tosynthesize continuous time-varying vowel trajectories, via Klatt Synthesizer.Our best model achieves R-squared (R^2) measure of 99.96 for the regressiontask. Our network lays the foundation for an SSI as it successfully tracks thetongue contour automatically as an internal representation without any explicitannotation.

Please select stars to rate!

         

0 comments Sign in to leave a comment.

    Data loading, please wait...
×