‘Automatic Collation for Diversifying Corpora: Commonly Copied Texts as Distant Supervision for Handwritten Text Recognition’

“Handwritten text recognition (HTR) has enabled many researchers to gather textual evidence from the human record. … To build generalized models for Arabic-script manuscripts, perhaps one of the largest textual traditions in the pre-modern world, we need an approach that can improve its accuracy on unseen manuscripts and hands without linear growth in the amount of manually annotated data. We propose Automatic Collation for Diversifying Corpora (ACDC), taking advantage of the existence of multiple manuscripts of popular texts.”

Find the paper and full list of authors in the Computational Humanities Research Conference 2023 proceedings.

View on Site: ‘Automatic Collation for Diversifying Corpora: Commonly Copied Texts as Distant Supervision for Handwritten Text Recognition’