eduzhai > Applied Sciences > Engineering >

Data balancing for boosting performance of low-frequency classes in Spoken Language Understanding

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Despite the fact that data imbalance is becoming more and more common inreal-world Spoken Language Understanding (SLU) applications, it has not beenstudied extensively in the literature. To the best of our knowledge, this paperpresents the first systematic study on handling data imbalance for SLU. Inparticular, we discuss the application of existing data balancing techniquesfor SLU and propose a multi-task SLU model for intent classification and slotfilling. Aiming to avoid over-fitting, in our model methods for data balancingare leveraged indirectly via an auxiliary task which makes use of aclass-balanced batch generator and (possibly) synthetic data. Our results on areal-world dataset indicate that i) our proposed model can boost performance onlow frequency intents significantly while avoiding a potential performancedecrease on the head intents, ii) synthetic data are beneficial forbootstrapping new intents when realistic data are not available, but iii) oncea certain amount of realistic data becomes available, using synthetic data inthe auxiliary task only yields better performance than adding them to theprimary task training data, and iv) in a joint training scenario, balancing theintent distribution individually improves not only intent classification butalso slot filling performance.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...