eduzhai > Applied Sciences > Engineering >

Variable frame rate-based data augmentation to handle speaking-style variability for automatic speaker verification

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: The effects of speaking-style variability on automatic speaker verificationwere investigated using the UCLA Speaker Variability database which comprisesmultiple speaking styles per speaker. An x-vector PLDA (probabilistic lineardiscriminant analysis) system was trained with the SRE and Switchboarddatabases with standard augmentation techniques and evaluated with utterancesfrom the UCLA database. The equal error rate (EER) was low when enrollment andtest utterances were of the same style (e.g., 0.98 and 0.57 for read andconversational speech, respectively), but it increased substantially whenstyles were mismatched between enrollment and test utterances. For instance,when enrolled with conversation utterances, the EER increased to 3.03 , 2.96 and 22.12 when tested on read, narrative, and pet-directed speech,respectively. To reduce the effect of style mismatch, we propose anentropy-based variable frame rate technique to artificially generatestyle-normalized representations for PLDA adaptation. The proposed systemsignificantly improved performance. In the aforementioned conditions, the EERsimproved to 2.69 (conversation -- read), 2.27 (conversation -- narrative),and 18.75 (pet-directed -- read). Overall, the proposed technique performedcomparably to multi-style PLDA adaptation without the need for training data indifferent speaking styles per speaker.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...