Automatic translation to and from Italian Sign Language (LIS) requires the development of computational models, such as avatars, capable of accurately reproducing both manual and non-manual articulators of signed discourse. This, in turn, demands the creation of machineprocessable and linguistically robust data collections, built through the segmentation, transcription and systematic categorization of signs, to capture their internal structure and relational dynamics. Such a framework should reflect the multilinear organization of LIS, which poses several challenges. These include the visual-gestural and simultaneous nature of LIS, the absence of a standardized written form and the scarcity of available resources. Key challenges arise at multiple levels, including the very development of LIS resources, the identification of suitable tools for capturing signed data and the lack of a standardized coding system for signed languages. These aspects were addressed in the development of the MultiMedaLIS Dataset (MULTImodal MEDicAl LIS Dataset), a preliminary dataset in the medical domain, collected using multimodal capturing tools. Annotation, performed using ELAN, followed the principles of simplicity and readability by employing multilayered labelling in both Italian and English, along with a dedicated annotation system for signed languages. In this way, the Dataset is accessible to both signers and non-signers, currently serving as a resource for linguistic analyses, as well as for training algorithms for automatic sign recognition.
The Development of a Medical Dataset in Italian Sign Language (LIS): Theoretical Considerations and Practical Applications
Gaia CaligiorePrimo
2026-01-01
Abstract
Automatic translation to and from Italian Sign Language (LIS) requires the development of computational models, such as avatars, capable of accurately reproducing both manual and non-manual articulators of signed discourse. This, in turn, demands the creation of machineprocessable and linguistically robust data collections, built through the segmentation, transcription and systematic categorization of signs, to capture their internal structure and relational dynamics. Such a framework should reflect the multilinear organization of LIS, which poses several challenges. These include the visual-gestural and simultaneous nature of LIS, the absence of a standardized written form and the scarcity of available resources. Key challenges arise at multiple levels, including the very development of LIS resources, the identification of suitable tools for capturing signed data and the lack of a standardized coding system for signed languages. These aspects were addressed in the development of the MultiMedaLIS Dataset (MULTImodal MEDicAl LIS Dataset), a preliminary dataset in the medical domain, collected using multimodal capturing tools. Annotation, performed using ELAN, followed the principles of simplicity and readability by employing multilayered labelling in both Italian and English, along with a dedicated annotation system for signed languages. In this way, the Dataset is accessible to both signers and non-signers, currently serving as a resource for linguistic analyses, as well as for training algorithms for automatic sign recognition.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


