This paper presents a FAIR-compliant etymological dataset of noun and verb lemmas from the Gallo-Italic variety spoken in the Sicilian towns of Nicosia and Sperlinga. Based on Trovato and Menza (2020) and various lexicographic materials, the dataset captures borrowing relations from Sicilian, modeled using the OntoLex-lemon framework and its etymological extension. Linguistic phenomena involved in these borrowings—so-called GalloSicilian features—are formalized in the Linguistic Phenomena Ontology (LiPh) as regular relations, enabling the automated generation of candidate derivations. Verified derivations, reviewed by lexicographers, are encoded using LiPh to reflect how individual features contribute to lexical transformations. The dataset is enriched with detailed metadata, geographic and linguistic contextualization, and is aligned with current standards in lexical resource publication, supporting future reuse and integration in the Linguistic Linked Open Data cloud.

An Etymological Dataset for Nouns and Verbs in the Gallo-Italic Variety Spoken in Nicosia and Sperlinga

Domenico Cantone;Cristiano Longo;Salvatore Menza;Marianna Nicolosi Asmundo;Daniele Francesco Santamaria
2026-01-01

Abstract

This paper presents a FAIR-compliant etymological dataset of noun and verb lemmas from the Gallo-Italic variety spoken in the Sicilian towns of Nicosia and Sperlinga. Based on Trovato and Menza (2020) and various lexicographic materials, the dataset captures borrowing relations from Sicilian, modeled using the OntoLex-lemon framework and its etymological extension. Linguistic phenomena involved in these borrowings—so-called GalloSicilian features—are formalized in the Linguistic Phenomena Ontology (LiPh) as regular relations, enabling the automated generation of candidate derivations. Verified derivations, reviewed by lexicographers, are encoded using LiPh to reflect how individual features contribute to lexical transformations. The dataset is enriched with detailed metadata, geographic and linguistic contextualization, and is aligned with current standards in lexical resource publication, supporting future reuse and integration in the Linguistic Linked Open Data cloud.
2026
Semantic Web, OWL, FAIR, Historical Linguistics, Language Contact Theory, Gallo-Italic languages, Sicilian
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/708469
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact