eduzhai > Applied Sciences > Engineering >

The Notary in the Haystack -- Countering Class Imbalance in Document Processing with CNNs

  • king
  • (0) Download
  • 20210506
  • Save

... pages left unread,continue reading

Document pages: 16 pages

Abstract: Notarial instruments are a category of documents. A notarial instrument canbe distinguished from other documents by its notary sign, a prominent symbol inthe certificate, which also allows to identify the document s issuer.Naturally, notarial instruments are underrepresented in regard to otherdocuments. This makes a classification difficult because class imbalance intraining data worsens the performance of Convolutional Neural Networks. In thiswork, we evaluate different countermeasures for this problem. They are appliedto a binary classification and a segmentation task on a collection of medievaldocuments. In classification, notarial instruments are distinguished from otherdocuments, while the notary sign is separated from the certificate in thesegmentation task. We evaluate different techniques, such as data augmentation,under- and oversampling, as well as regularizing with focal loss. Thecombination of random minority oversampling and data augmentation leads to thebest performance. In segmentation, we evaluate three loss-functions and theircombinations, where only class-weighted dice loss was able to segment thenotary sign sufficiently.

Please select stars to rate!


0 comments Sign in to leave a comment.

    Data loading, please wait...