eduzhai > Applied Sciences > Engineering >

Self-Attentive Multi-Layer Aggregation with Feature Recalibration and Normalization for End-to-End Speaker Verification System

  • king
  • (0) Download
  • 20210505
  • Save

... pages left unread,continue reading

Document pages: 5 pages

Abstract: One of the most important parts of an end-to-end speaker verification systemis the speaker embedding generation. In our previous paper, we reported thatshortcut connections-based multi-layer aggregation improves therepresentational power of the speaker embedding. However, the number of modelparameters is relatively large and the unspecified variations increase in themulti-layer aggregation. Therefore, we propose a self-attentive multi-layeraggregation with feature recalibration and normalization for end-to-end speakerverification system. To reduce the number of model parameters, the ResNet,which scaled channel width and layer depth, is used as a baseline. To controlthe variability in the training, a self-attention mechanism is applied toperform the multi-layer aggregation with dropout regularizations and batchnormalizations. Then, a feature recalibration layer is applied to theaggregated feature using fully-connected layers and nonlinear activationfunctions. Deep length normalization is also used on a recalibrated feature inthe end-to-end training process. Experimental results using the VoxCeleb1evaluation dataset showed that the performance of the proposed methods wascomparable to that of state-of-the-art models (equal error rate of 4.95 and2.86 , using the VoxCeleb1 and VoxCeleb2 training datasets, respectively).

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...