eduzhai > Applied Sciences > Computer Science >

HANA A HAndwritten NAme Database for Offline Handwritten Text Recognition

  • KanKan
  • (0) Download
  • 20210424
  • Save

... pages left unread,continue reading

Document pages: 9 pages

Abstract: Methods for linking individuals across historical data sets, typically incombination with AI based transcription models, are developing rapidly.Probably the single most important identifier for linking is personal names.However, personal names are prone to enumeration and transcription errors andalthough modern linking methods are designed to handle such challenges thesesources of errors are critical and should be minimized. For this purpose,improved transcription methods and large-scale databases are crucialcomponents. This paper describes and provides documentation for HANA, a newlyconstructed large-scale database which consists of more than 1.1 million imagesof handwritten word-groups. The database is a collection of personal names,containing more than 105 thousand unique names with a total of more than 3.3million examples. In addition, we present benchmark results for deep learningmodels that automatically can transcribe the personal names from the scanneddocuments. Focusing mainly on personal names, due to its vital role in linking,we hope to foster more sophisticated, accurate, and robust models forhandwritten text recognition through making more challenging large-scaledatabases publicly available. This paper describes the data source, thecollection process, and the image-processing procedures and methods that areinvolved in extracting the handwritten personal names and handwritten text ingeneral from the forms.

Please select stars to rate!

         

0 comments Sign in to leave a comment.

    Data loading, please wait...
×