This paper reports a research based on analyzing criminal sentences on organized crime activities in Sicily pronounced from 2000 through 2006.For this case study we split the analysis of the textual corpus into three main stages. In the first stage, we collected the criminal sentences from the various courthouses. Since there is not yet a unified digital archive of criminal sentences in Sicily, all sentences had to be collected in their paper format. The paper sentences have hence been scanned into PDF files, and then converted into TXT files by means of OCR technology.In the second stage, the text files were parsed in order to extract the names of the actors involved in the facts and the relationships between them. The actors have been univocally labelled with the following roles: judge, members of the court, prosecutor, defendants, lawyers. Names that weren’t labelled have been purged from the database. Relationships between actors were also extracted.In the third stage, we modelled in a social network like style the information obtained in the previous stage. The social network has been analyzed using the JUNG Java library. In particular, the network has been inspected, in order to detect central nodes and sub-communities.

Information Extraction and Social Network Analysis of Criminal Sentences

De Felice, D.;Di Silvestro, L.;Gallo, G.;Giura, G.;Pennisi, C.;Zarba, C.;Giuffrida, G.
2012-01-01

Abstract

This paper reports a research based on analyzing criminal sentences on organized crime activities in Sicily pronounced from 2000 through 2006.For this case study we split the analysis of the textual corpus into three main stages. In the first stage, we collected the criminal sentences from the various courthouses. Since there is not yet a unified digital archive of criminal sentences in Sicily, all sentences had to be collected in their paper format. The paper sentences have hence been scanned into PDF files, and then converted into TXT files by means of OCR technology.In the second stage, the text files were parsed in order to extract the names of the actors involved in the facts and the relationships between them. The actors have been univocally labelled with the following roles: judge, members of the court, prosecutor, defendants, lawyers. Names that weren’t labelled have been purged from the database. Relationships between actors were also extracted.In the third stage, we modelled in a social network like style the information obtained in the previous stage. The social network has been analyzed using the JUNG Java library. In particular, the network has been inspected, in order to detect central nodes and sub-communities.
2012
978-92-837-0165-1
criminal sentences; analysis of the textual corpus; social network analysis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/77820
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact