eduzhai > Applied Sciences > Engineering >

Content based singing voice source separation via strong conditioning using aligned phonemes

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 9 pages

Abstract: Informed source separation has recently gained renewed interest with theintroduction of neural networks and the availability of large multitrackdatasets containing both the mixture and the separated sources. Theseapproaches use prior information about the target source to improve separation.Historically, Music Information Retrieval researchers have focused primarily onscore-informed source separation, but more recent approaches explorelyrics-informed source separation. However, because of the lack of multitrackdatasets with time-aligned lyrics, models use weak conditioning withnon-aligned lyrics. In this paper, we present a multimodal multitrack datasetwith lyrics aligned in time at the word level with phonetic information as wellas explore strong conditioning using the aligned phonemes. Our model follows aU-Net architecture and takes as input both the magnitude spectrogram of amusical mixture and a matrix with aligned phonetic information. The phonemematrix is embedded to obtain the parameters that control Feature-wise LinearModulation (FiLM) layers. These layers condition the U-Net feature maps toadapt the separation process to the presence of different phonemes via affinetransformations. We show that phoneme conditioning can be successfully appliedto improve singing voice source separation.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...