Automatic translation to and from Italian Sign Language (LIS) requires the development of computational models, such as avatars, capable of accurately reproducing both manual and non-manual articulators of signed discourse. This, in turn, demands the creation of machineprocessable and linguistically robust data collections, built through the segmentation, transcription and systematic categorization of signs, to capture their internal structure and relational dynamics. Such a framework should reflect the multilinear organization of LIS, which poses several challenges. These include the visual-gestural and simultaneous nature of LIS, the absence of a standardized written form and the scarcity of available resources. Key challenges arise at multiple levels, including the very development of LIS resources, the identification of suitable tools for capturing signed data and the lack of a standardized coding system for signed languages. These aspects were addressed in the development of the MultiMedaLIS Dataset (MULTImodal MEDicAl LIS Dataset), a preliminary dataset in the medical domain, collected using multimodal capturing tools. Annotation, performed using ELAN, followed the principles of simplicity and readability by employing multilayered labelling in both Italian and English, along with a dedicated annotation system for signed languages. In this way, the Dataset is accessible to both signers and non-signers, currently serving as a resource for linguistic analyses, as well as for training algorithms for automatic sign recognition.

The Development of a Medical Dataset in Italian Sign Language (LIS): Theoretical Considerations and Practical Applications

Gaia Caligiore
Primo
2026-01-01

Abstract

Automatic translation to and from Italian Sign Language (LIS) requires the development of computational models, such as avatars, capable of accurately reproducing both manual and non-manual articulators of signed discourse. This, in turn, demands the creation of machineprocessable and linguistically robust data collections, built through the segmentation, transcription and systematic categorization of signs, to capture their internal structure and relational dynamics. Such a framework should reflect the multilinear organization of LIS, which poses several challenges. These include the visual-gestural and simultaneous nature of LIS, the absence of a standardized written form and the scarcity of available resources. Key challenges arise at multiple levels, including the very development of LIS resources, the identification of suitable tools for capturing signed data and the lack of a standardized coding system for signed languages. These aspects were addressed in the development of the MultiMedaLIS Dataset (MULTImodal MEDicAl LIS Dataset), a preliminary dataset in the medical domain, collected using multimodal capturing tools. Annotation, performed using ELAN, followed the principles of simplicity and readability by employing multilayered labelling in both Italian and English, along with a dedicated annotation system for signed languages. In this way, the Dataset is accessible to both signers and non-signers, currently serving as a resource for linguistic analyses, as well as for training algorithms for automatic sign recognition.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/704609
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact