eduzhai > Applied Sciences > Engineering >

Why Did the x-Vector System Miss a Target Speaker? Impact of Acoustic Mismatch Upon Target Score on VoxCeleb Data

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: Modern automatic speaker verification (ASV) relies heavily on machinelearning implemented through deep neural networks. It can be difficult tointerpret the output of these black boxes. In line with interpretative machinelearning, we model the dependency of ASV detection score upon acoustic mismatchof the enrollment and test utterances. We aim to identify mismatch factors thatexplain target speaker misses (false rejections). We use distance in the first-and second-order statistics of selected acoustic features as the predictors ina linear mixed effects model, while a standard Kaldi x-vector system forms ourASV black-box. Our results on the VoxCeleb data reveal the most prominentmismatch factor to be in F0 mean, followed by mismatches associated withformant frequencies. Our findings indicate that x-vector systems lackrobustness to intra-speaker variations.

Please select stars to rate!

         

0 comments Sign in to leave a comment.

    Data loading, please wait...
×