eduzhai > Applied Sciences > Engineering >

Embodied Self-supervised Learning by Coordinated Sampling and Training

  • Save

... pages left unread,continue reading

Document pages: 10 pages

Abstract: Self-supervised learning can significantly improve the performance ofdownstream tasks, however, the dimensions of learned representations normallylack explicit physical meanings. In this work, we propose a novelself-supervised approach to solve inverse problems by employing thecorresponding physical forward process so that the learned representations canhave explicit physical meanings. The proposed approach works in ananalysis-by-synthesis manner to learn an inference network by iterativelysampling and training. At the sampling step, given observed data, the inferencenetwork is used to approximate the intractable posterior, from which we sampleinput parameters and feed them to a physical process to generate data in theobservational space; At the training step, the same network is optimized withthe sampled paired data. We prove the feasibility of the proposed method bytackling the acoustic-to-articulatory inversion problem to infer articulatoryinformation from speech. Given an articulatory synthesizer, an inference modelcan be trained completely from scratch with random initialization. Ourexperiments demonstrate that the proposed method can converge steadily and thenetwork learns to control the articulatory synthesizer to speak like a human.We also demonstrate that trained models can generalize well to unseen speakersor even new languages, and performance can be further improved throughself-adaptation.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...