eduzhai > Applied Sciences > Engineering >

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

  • king
  • (0) Download
  • 20210507
  • Save

... pages left unread,continue reading

Document pages: 29 pages

Abstract: Speech enhancement and speech separation are two related tasks, whose purposeis to extract either one or more target speech signals, respectively, from amixture of sounds generated by several sources. Traditionally, these tasks havebeen tackled using signal processing and machine learning techniques applied tothe available acoustic signals. Since the visual aspect of speech isessentially unaffected by the acoustic environment, visual information from thetarget speakers, such as lip movements and facial expressions, has also beenused for speech enhancement and speech separation systems. In order toefficiently fuse acoustic and visual information, researchers have exploitedthe flexibility of data-driven approaches, specifically deep learning,achieving strong performance. The ceaseless proposal of a large number oftechniques to extract features and fuse multimodal information has highlightedthe need for an overview that comprehensively describes and discussesaudio-visual speech enhancement and separation based on deep learning. In thispaper, we provide a systematic survey of this research topic, focusing on themain elements that characterise the systems in the literature: acousticfeatures; visual features; deep learning methods; fusion techniques; trainingtargets and objective functions. In addition, we review deep-learning-basedmethods for speech reconstruction from silent videos and audio-visual soundsource separation for non-speech signals, since these methods can be more orless directly applied to audio-visual speech enhancement and separation.Finally, we survey commonly employed audio-visual speech datasets, given theircentral role in the development of data-driven approaches, and evaluationmethods, because they are generally used to compare different systems anddetermine their performance.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...