eduzhai > Applied Sciences > Engineering >

FastLR Non-Autoregressive Lipreading Model with Integrate-and-Fire

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 9 pages

Abstract: Lipreading is an impressive technique and there has been a definiteimprovement of accuracy in recent years. However, existing methods forlipreading mainly build on autoregressive (AR) model, which generate targettokens one by one and suffer from high inference latency. To breakthrough thisconstraint, we propose FastLR, a non-autoregressive (NAR) lipreading modelwhich generates all target tokens simultaneously. NAR lipreading is achallenging task that has many difficulties: 1) the discrepancy of sequencelengths between source and target makes it difficult to estimate the length ofthe output sequence; 2) the conditionally independent behavior of NARgeneration lacks the correlation across time which leads to a poorapproximation of target distribution; 3) the feature representation ability ofencoder can be weak due to lack of effective alignment mechanism; and 4) theremoval of AR language model exacerbates the inherent ambiguity problem oflipreading. Thus, in this paper, we introduce three methods to reduce the gapbetween FastLR and AR model: 1) to address challenges 1 and 2, we leverageintegrate-and-fire (I &F) module to model the correspondence between sourcevideo frames and output text sequence. 2) To tackle challenge 3, we add anauxiliary connectionist temporal classification (CTC) decoder to the top of theencoder and optimize it with extra CTC loss. We also add an auxiliaryautoregressive decoder to help the feature extraction of encoder. 3) Toovercome challenge 4, we propose a novel Noisy Parallel Decoding (NPD) for I &Fand bring Byte-Pair Encoding (BPE) into lipreading. Our experiments exhibitthat FastLR achieves the speedup up to 10.97$ times$ comparing withstate-of-the-art lipreading model with slight WER absolute increase of 1.5 and 5.5 on GRID and LRS2 lipreading datasets respectively, which demonstratesthe effectiveness of our proposed method.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...