'Fake news' is information that generally spreads on the web, which onlymimics the form of reliable news media content. The phenomenon has assumeduncontrolled proportions in recent years rising the concern of authorities andcitizens. In this paper we present a classifier able to distinguish a reliablesource from a fake news website. We have prepared a dataset made of 200 fakenews websites and 200 reliable websites from all over the world and used aspredictors information potentially available on websites, such as the presenceof a 'contact us' section or a secured connection. The algorithm is based onlogistic regression, whereas further analyses were carried out usingtetrachoric correlation coefficients for dichotomous variables and chi-squaretests. This framework offers a concrete solution to attribute a 'reliabilityscore' to news website, defined as the probability that a source is reliable ornot, and on this probability a user can decide if the news is worth sharing ornot.

A Classification Algorithm to Recognize Fake News Websites

Giuseppe Pernagallo
;
Benedetto Torrisi
;
Davide Bennato
2020

Abstract

'Fake news' is information that generally spreads on the web, which onlymimics the form of reliable news media content. The phenomenon has assumeduncontrolled proportions in recent years rising the concern of authorities andcitizens. In this paper we present a classifier able to distinguish a reliablesource from a fake news website. We have prepared a dataset made of 200 fakenews websites and 200 reliable websites from all over the world and used aspredictors information potentially available on websites, such as the presenceof a 'contact us' section or a secured connection. The algorithm is based onlogistic regression, whereas further analyses were carried out usingtetrachoric correlation coefficients for dichotomous variables and chi-squaretests. This framework offers a concrete solution to attribute a 'reliabilityscore' to news website, defined as the probability that a source is reliable ornot, and on this probability a user can decide if the news is worth sharing ornot.
978-3-030-51222-4
cs.SI; cs.SI; Computer Science - Computers and Society; Statistics - Applications
Binary data. Classification algorithm, Fake news, Logit, Misleading information, Websites
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/362954
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact