Repository logo
  • English
  • العربية
  • বাংলা
  • Català
  • Čeština
  • Deutsch
  • Ελληνικά
  • Español
  • Suomi
  • Français
  • Gàidhlig
  • हिंदी
  • Magyar
  • Italiano
  • Қазақ
  • Latviešu
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Српски
  • Svenska
  • Türkçe
  • Yкраї́нська
  • Tiếng Việt
Log In
New user? Click here to register.Have you forgotten your password?
  1. Home
  2. IIT Gandhinagar
  3. Computer Science and Engineering
  4. CSE Publications
  5. HDIB1M - Handwritten document image binarization 1 million dataset
 
  • Details

HDIB1M - Handwritten document image binarization 1 million dataset

Source
arXiv
Date Issued
2021-01-01
Abstract
Handwritten document image binarization is a challenging task due to high diversity in the content, page style, and condition of the documents. While the traditional thresholding methods fail to generalize on such challenging scenarios, deep learning based methods can generalize well however, require a large training data. Current datasets for handwritten document image binarization are limited in size and fail to represent several challenging real-world scenarios. To solve this problem, we propose HDIB1M - a handwritten document image binarization dataset of 1M images. We also present a novel method used to generate this dataset. To show the effectiveness of our dataset we train a deep learning model UNetED on our dataset and evaluate its performance on other publicly available datasets. The dataset and the code will be made available to the community.
URI
http://arxiv.org/abs/2101.11674
https://d8.irins.org/handle/IITG2025/19806
Subjects
Computer Science
Computer Vision
Pattern Recognition
IITGN Knowledge Repository Developed and Managed by Library

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Privacy policy
  • End User Agreement
  • Send Feedback
Repository logo COAR Notify