eduzhai > Applied Sciences > Engineering >

Multi-path RNN for hierarchical modeling of long sequential data and its application to speaker stream separation

  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Recently, the source separation performance was greatly improved bytime-domain audio source separation based on dual-path recurrent neural network(DPRNN). DPRNN is a simple but effective model for a long sequential data.While DPRNN is quite efficient in modeling a sequential data of the length ofan utterance, i.e., about 5 to 10 second data, it is harder to apply it tolonger sequences such as whole conversations consisting of multiple utterances.It is simply because, in such a case, the number of time steps consumed by itsinternal module called inter-chunk RNN becomes extremely large. To mitigatethis problem, this paper proposes a multi-path RNN (MPRNN), a generalizedversion of DPRNN, that models the input data in a hierarchical manner. In theMPRNN framework, the input data is represented at several (>3)time-resolutions, each of which is modeled by a specific RNN sub-module. Forexample, the RNN sub-module that deals with the finest resolution may modeltemporal relationship only within a phoneme, while the RNN sub-module handlingthe most coarse resolution may capture only the relationship between utterancessuch as speaker information. We perform experiments using simulateddialogue-like mixtures and show that MPRNN has greater model capacity, and itoutperforms the current state-of-the-art DPRNN framework especially in onlineprocessing scenarios.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...