ASTRID - Análisis y Transcripción Semántica para Imágenes de Documentos Manuscritos

Ministerio de Ciencias, innovación y universidades

Advances in the development of methods for automatically extracting and understanding the content of handwritten digitized documents will continue being an important need for our society. This project addresses three challenging computational problems related to automatic handwritten text processing of document images: (1) document layout extraction over unstructured documents, (2) continuous handwritten text recognition under unrestricted conditions and (3) offline verification of human signatures using advanced deep neural models, respectively. The proposed solutions to previous problems will be adapted to several applications presenting a socio-economic interest. In particular: the analysis and transcription of historical documents, and some demographic prediction problems based on use of handwriting (for example, recognizing the gender or handedness of a person). In this project, we will emphasize the application of developments oriented towards the Spanish language with the creation of several annotated datasets for considered problems that will be made available to the research community.

In general, when addressing these problems and applications, we will make use of technologies based on the Deep Learning paradigm along with other advanced Machine Learning techniques that have recently achieved very competitive results in complex Computer Vision problems.

This project is proposed as a natural continuation of our previous project: "Algorithms and techniques for the challenges of extracting semantic content from scanned document images" (TIN2014-57458-R), which was presented to the Retos Program Call of 2014. The previous results achieved and published by the applicant research group in some of the problems addressed in this proposal will help us to guarantee the proposed objectives of this research. The approach of this project is also oriented towards industrial applications, and it has been supported by several private companies with which we already have experience in the development of projects related to some of the problems raised in this research.