eduzhai > Applied Sciences > Engineering >

Face-to-Music Translation Using a Distance-Preserving Generative Adversarial Network with an Auxiliary Discriminator

  • Save

... pages left unread,continue reading

Document pages: 15 pages

Abstract: Learning a mapping between two unrelated domains-such as image and audio,without any supervision is a challenging task. In this work, we propose adistance-preserving generative adversarial model to translate images of humanfaces into an audio domain. The audio domain is defined by a collection ofmusical note sounds recorded by 10 different instrument families (NSynth cite{nsynth2017}) and a distance metric where the instrument family classinformation is incorporated together with a mel-frequency cepstral coefficients(MFCCs) feature. To enforce distance-preservation, a loss term that penalizesdifference between pairwise distances of the faces and the translated audiosamples is used. Further, we discover that the distance preservation constraintin the generative adversarial model leads to reduced diversity in thetranslated audio samples, and propose the use of an auxiliary discriminator toenhance the diversity of the translations while using the distance preservationconstraint. We also provide a visual demonstration of the results and numericalanalysis of the fidelity of the translations. A video demo of our proposedmodel s learned translation is available inthis https URL.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...