'Fake news' is information that generally spreads on the web, which onlymimics the form of reliable news media content. The phenomenon has assumeduncontrolled proportions in recent years rising the concern of authorities andcitizens. In this paper we present a classifier able to distinguish a reliablesource from a fake news website. We have prepared a dataset made of 200 fakenews websites and 200 reliable websites from all over the world and used aspredictors information potentially available on websites, such as the presenceof a 'contact us' section or a secured connection. The algorithm is based onlogistic regression, whereas further analyses were carried out usingtetrachoric correlation coefficients for dichotomous variables and chi-squaretests. This framework offers a concrete solution to attribute a 'reliabilityscore' to news website, defined as the probability that a source is reliable ornot, and on this probability a user can decide if the news is worth sharing ornot.

A Classification Algorithm to Recognize Fake News Websites

Giuseppe Pernagallo
Writing – Original Draft Preparation
;
Benedetto Torrisi
Writing – Original Draft Preparation
;
Davide Bennato
Writing – Original Draft Preparation
2021-01-01

Abstract

'Fake news' is information that generally spreads on the web, which onlymimics the form of reliable news media content. The phenomenon has assumeduncontrolled proportions in recent years rising the concern of authorities andcitizens. In this paper we present a classifier able to distinguish a reliablesource from a fake news website. We have prepared a dataset made of 200 fakenews websites and 200 reliable websites from all over the world and used aspredictors information potentially available on websites, such as the presenceof a 'contact us' section or a secured connection. The algorithm is based onlogistic regression, whereas further analyses were carried out usingtetrachoric correlation coefficients for dichotomous variables and chi-squaretests. This framework offers a concrete solution to attribute a 'reliabilityscore' to news website, defined as the probability that a source is reliable ornot, and on this probability a user can decide if the news is worth sharing ornot.
2021
978-3-030-51221-7
cs.SI; cs.SI; Computer Science - Computers and Society; Statistics - Applications
Binary data. Classification algorithm, Fake news, Logit, Misleading information, Websites
File in questo prodotto:
File Dimensione Formato  
A classification algorithm.pdf

solo gestori archivio

Descrizione: A classification
Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 776.64 kB
Formato Adobe PDF
776.64 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/362954
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact