This paper presents a case study focused on the application of handwriting recognition to digitize historical clinical records containing significant handwritten content. The primary objective is to assess the feasibility of using commercial OCR technologies-in particular, Microsoft Azure's handwriting recognition API-for processing health documents. The study aims to determine whether these tools can support the extraction of meaningful clinical information, not only by recognizing individual characters but also by leveraging the structural layout of documents, such as forms, to infer semantic content. Our methodology includes empirical evaluation of OCR output on real-world patient records, alongside a qualitative analysis of common recognition errors. In addition, we review relevant approaches from the literature, highlighting recent advances in deep learning for document understanding. The findings indicate that general-purpose OCR systems are currently insufficient for reliable clinical data extraction in such contexts, primarily due to the complexity and variability of handwritten medical records. However, the results also suggest that structural cues present in form-based documents could be harnessed-through tailored AI-based techniques-to significantly improve recognition and downstream information retrieval.

Paper-Based Health Records: A Case Study on the Digitization of Handwritten Clinical Records

Carchiolo V.;Malgeri M.;
2025-01-01

Abstract

This paper presents a case study focused on the application of handwriting recognition to digitize historical clinical records containing significant handwritten content. The primary objective is to assess the feasibility of using commercial OCR technologies-in particular, Microsoft Azure's handwriting recognition API-for processing health documents. The study aims to determine whether these tools can support the extraction of meaningful clinical information, not only by recognizing individual characters but also by leveraging the structural layout of documents, such as forms, to infer semantic content. Our methodology includes empirical evaluation of OCR output on real-world patient records, alongside a qualitative analysis of common recognition errors. In addition, we review relevant approaches from the literature, highlighting recent advances in deep learning for document understanding. The findings indicate that general-purpose OCR systems are currently insufficient for reliable clinical data extraction in such contexts, primarily due to the complexity and variability of handwritten medical records. However, the results also suggest that structural cues present in form-based documents could be harnessed-through tailored AI-based techniques-to significantly improve recognition and downstream information retrieval.
2025
Application
Health Management
OCR
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/717679
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact