This paper reports a research based on analyzing criminal sentences on organized crime activities in Sicily pronounced from 2000 through 2006.For this case study we split the analysis of the textual corpus into three main stages. In the first stage, we collected the criminal sentences from the various courthouses. Since there is not yet a unified digital archive of criminal sentences in Sicily, all sentences had to be collected in their paper format. The paper sentences have hence been scanned into PDF files, and then converted into TXT files by means of OCR technology.In the second stage, the text files were parsed in order to extract the names of the actors involved in the facts and the relationships between them. The actors have been univocally labelled with the following roles: judge, members of the court, prosecutor, defendants, lawyers. Names that weren’t labelled have been purged from the database. Relationships between actors were also extracted.In the third stage, we modelled in a social network like style the information obtained in the previous stage. The social network has been analyzed using the JUNG Java library. In particular, the network has been inspected, in order to detect central nodes and sub-communities.
Information Extraction and Social Network Analysis of Criminal Sentences
De Felice, D.;Di Silvestro, L.;Gallo, G.;Giura, G.;Pennisi, C.;Zarba, C.;Giuffrida, G.
2012-01-01
Abstract
This paper reports a research based on analyzing criminal sentences on organized crime activities in Sicily pronounced from 2000 through 2006.For this case study we split the analysis of the textual corpus into three main stages. In the first stage, we collected the criminal sentences from the various courthouses. Since there is not yet a unified digital archive of criminal sentences in Sicily, all sentences had to be collected in their paper format. The paper sentences have hence been scanned into PDF files, and then converted into TXT files by means of OCR technology.In the second stage, the text files were parsed in order to extract the names of the actors involved in the facts and the relationships between them. The actors have been univocally labelled with the following roles: judge, members of the court, prosecutor, defendants, lawyers. Names that weren’t labelled have been purged from the database. Relationships between actors were also extracted.In the third stage, we modelled in a social network like style the information obtained in the previous stage. The social network has been analyzed using the JUNG Java library. In particular, the network has been inspected, in order to detect central nodes and sub-communities.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.