Paper-Based Health Records: A Case Study on the Digitization of Handwritten Clinical Records

IRIS

This paper presents a case study focused on the application of handwriting recognition to digitize historical clinical records containing significant handwritten content. The primary objective is to assess the feasibility of using commercial OCR technologies-in particular, Microsoft Azure's handwriting recognition API-for processing health documents. The study aims to determine whether these tools can support the extraction of meaningful clinical information, not only by recognizing individual characters but also by leveraging the structural layout of documents, such as forms, to infer semantic content. Our methodology includes empirical evaluation of OCR output on real-world patient records, alongside a qualitative analysis of common recognition errors. In addition, we review relevant approaches from the literature, highlighting recent advances in deep learning for document understanding. The findings indicate that general-purpose OCR systems are currently insufficient for reliable clinical data extraction in such contexts, primarily due to the complexity and variability of handwritten medical records. However, the results also suggest that structural cues present in form-based documents could be harnessed-through tailored AI-based techniques-to significantly improve recognition and downstream information retrieval.

Paper-Based Health Records: A Case Study on the Digitization of Handwritten Clinical Records

Carchiolo V.;Malgeri M.;Spadaro Sapari L.

2025-01-01

Abstract

This paper presents a case study focused on the application of handwriting recognition to digitize historical clinical records containing significant handwritten content. The primary objective is to assess the feasibility of using commercial OCR technologies-in particular, Microsoft Azure's handwriting recognition API-for processing health documents. The study aims to determine whether these tools can support the extraction of meaningful clinical information, not only by recognizing individual characters but also by leveraging the structural layout of documents, such as forms, to infer semantic content. Our methodology includes empirical evaluation of OCR output on real-world patient records, alongside a qualitative analysis of common recognition errors. In addition, we review relevant approaches from the literature, highlighting recent advances in deep learning for document understanding. The findings indicate that general-purpose OCR systems are currently insufficient for reliable clinical data extraction in such contexts, primarily due to the complexity and variability of handwritten medical records. However, the results also suggest that structural cues present in form-based documents could be harnessed-through tailored AI-based techniques-to significantly improve recognition and downstream information retrieval.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN
	
				9789897587726
			
	Parole chiave
	
				Application
Health Management
OCR
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Paper_Based_Health_Records__A_Case_Study_on_the_Digitization_of_Handwritten_Clinical_Records (1).pdf accesso aperto Licenza: Creative commons Dimensione 1.31 MB Formato Adobe PDF Visualizza/Apri	1.31 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/717679

Citazioni

ND

0

ND

social impact