In this paper we describe a deep learning model based on a Convolutional Neural Network (CNN). The model was developed for the Profiling Hate Speech Spreaders (HSSs) task proposed by PAN 2021 organizers and hosted at the 2021 CLEF Conference. Our approach to the task of classifying an author as HSS or not (nHSS) takes advantage of a CNN based on a single convolutional layer. In this binary classification task, on the tests performed using a 5-fold cross validation, the proposed model reaches a maximum accuracy of 0.80 on the multilingual (i.e., English and Spanish) training set, and a minimum loss value of 0.51 on the same set. As announced by the task organizers, the trained model presented is able to reach an overall accuracy of 0.79 on the full test set. This overall accuracy is obtained averaging the accuracy achieved by the model on both languages. In particular, with regard to the Spanish test set, the organizers announced that our model achieves an accuracy of 0.85, while on the English test set the same model achieved - as announced by the organizers too - an accuracy of 0.73. Thanks to the model presented in this paper, our team won the 2021 PAN competition on profiling HSSs.
Detection of Hate Speech Spreaders using convolutional neural networks
Siino M.
Primo
;
2021-01-01
Abstract
In this paper we describe a deep learning model based on a Convolutional Neural Network (CNN). The model was developed for the Profiling Hate Speech Spreaders (HSSs) task proposed by PAN 2021 organizers and hosted at the 2021 CLEF Conference. Our approach to the task of classifying an author as HSS or not (nHSS) takes advantage of a CNN based on a single convolutional layer. In this binary classification task, on the tests performed using a 5-fold cross validation, the proposed model reaches a maximum accuracy of 0.80 on the multilingual (i.e., English and Spanish) training set, and a minimum loss value of 0.51 on the same set. As announced by the task organizers, the trained model presented is able to reach an overall accuracy of 0.79 on the full test set. This overall accuracy is obtained averaging the accuracy achieved by the model on both languages. In particular, with regard to the Spanish test set, the organizers announced that our model achieves an accuracy of 0.85, while on the English test set the same model achieved - as announced by the organizers too - an accuracy of 0.73. Thanks to the model presented in this paper, our team won the 2021 PAN competition on profiling HSSs.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.