Towards Grammatical Tagging for the Legal Language of Cybersecurity

IRIS

Legal language can be understood as the language typically used by those engaged in the legal profession and, as such, it may come both in spoken or written form. Recent legislation on cybersecurity obviously uses legal language in writing, thus inheriting all its interpretative complications due to the typical abundance of cases and sub-cases as well as to the general richness in detail. This paper faces the challenge of the essential interpretation of the legal language of cybersecurity, namely of the extraction of the essential Parts of Speech (POS) from the legal documents concerning cybersecurity. The challenge is overcome by our methodology for POS tagging of legal language. It leverages state-of-the-art open-source tools for Natural Language Processing (NLP) as well as manual analysis to validate the outcomes of the tools. As a result, the methodology is automated and, arguably, general for any legal language following minor tailoring of the preprocessing step. It is demonstrated over the most relevant EU legislation on cybersecurity, namely on the NIS 2 directive, producing the first, albeit essential, structured interpretation of such a relevant document. Moreover, our findings indicate that tools such as SpaCy and ClausIE reach their limits over the legal language of the NIS 2.

Towards Grammatical Tagging for the Legal Language of Cybersecurity

Castiglione G.;Bella G.;Santamaria D. F.

2023-01-01

Abstract

Legal language can be understood as the language typically used by those engaged in the legal profession and, as such, it may come both in spoken or written form. Recent legislation on cybersecurity obviously uses legal language in writing, thus inheriting all its interpretative complications due to the typical abundance of cases and sub-cases as well as to the general richness in detail. This paper faces the challenge of the essential interpretation of the legal language of cybersecurity, namely of the extraction of the essential Parts of Speech (POS) from the legal documents concerning cybersecurity. The challenge is overcome by our methodology for POS tagging of legal language. It leverages state-of-the-art open-source tools for Natural Language Processing (NLP) as well as manual analysis to validate the outcomes of the tools. As a result, the methodology is automated and, arguably, general for any legal language following minor tailoring of the preprocessing step. It is demonstrated over the most relevant EU legislation on cybersecurity, namely on the NIS 2 directive, producing the first, albeit essential, structured interpretation of such a relevant document. Moreover, our findings indicate that tools such as SpaCy and ClausIE reach their limits over the legal language of the NIS 2.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Codice ISBN
	
				9798400707728
			
	Parole chiave
	
				Act
Data Protection
NLP
POS tagging
Privacy
Pronouncement
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
3600160.3605069.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: PUBBLICO - Pubblico con Copyright Dimensione 544.38 kB Formato Adobe PDF Visualizza/Apri	544.38 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/572674

Citazioni

ND

6

4

social impact