Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Guided by a corpus linguistics approach, in this article we present a comparative evaluation of State-of-the-Art (SotA) models, with a special focus on Transformers, to address the task of Fake News Spreaders (i.e., users that share Fake News) detection. First, we explore the reference multilingual dataset for the considered task, exploiting corpus linguistics techniques, such as chi-square test, keywords and Word Sketch. Second, we perform experiments on several models for Natural Language Processing. Third, we perform a comparative evaluation using the most recent Transformer-based models (RoBERTa, DistilBERT, BERT, XLNet, ELECTRA, Longformer) and other deep and non-deep SotA models (CNN, MultiCNN, Bayes, SVM). The CNN tested outperforms all the models tested and, to the best of our knowledge, any existing approach on the same dataset. Fourth, to better understand this result, we conduct a post-hoc analysis as an attempt to investigate the behaviour of the presented best performing black-box model. This study highlights the importance of choosing a suitable classifier given the specific task. To make an educated decision, we propose the use of corpus linguistics techniques. Our results suggest that large pre-trained deep models like Transformers are not necessarily the first choice when addressing a text classification task as the one presented in this article. All the code developed to run our tests is publicly available on GitHub.

Siino, M., Di Nuovo, E., Tinnirello, I., La Cascia, M. (2022). Fake News Spreaders Detection: Sometimes Attention Is Not All You Need. INFORMATION, 13(9) [10.3390/info13090426].

Fake News Spreaders Detection: Sometimes Attention Is Not All You Need

Siino, Marco;Di Nuovo, Elisa;Tinnirello, Ilenia;La Cascia, Marco

2022-09-01

Abstract

Guided by a corpus linguistics approach, in this article we present a comparative evaluation of State-of-the-Art (SotA) models, with a special focus on Transformers, to address the task of Fake News Spreaders (i.e., users that share Fake News) detection. First, we explore the reference multilingual dataset for the considered task, exploiting corpus linguistics techniques, such as chi-square test, keywords and Word Sketch. Second, we perform experiments on several models for Natural Language Processing. Third, we perform a comparative evaluation using the most recent Transformer-based models (RoBERTa, DistilBERT, BERT, XLNet, ELECTRA, Longformer) and other deep and non-deep SotA models (CNN, MultiCNN, Bayes, SVM). The CNN tested outperforms all the models tested and, to the best of our knowledge, any existing approach on the same dataset. Fourth, to better understand this result, we conduct a post-hoc analysis as an attempt to investigate the behaviour of the presented best performing black-box model. This study highlights the importance of choosing a suitable classifier given the specific task. To make an educated decision, we propose the use of corpus linguistics techniques. Our results suggest that large pre-trained deep models like Transformers are not necessarily the first choice when addressing a text classification task as the one presented in this article. All the code developed to run our tests is publicly available on GitHub.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				set-2022
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				INFORMATION
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.3390/info13090426
			
	URL dell'editore (Open access ove possibile)
	
				https://www.mdpi.com/2078-2489/13/9/426
			
	Citazione
	
				Siino, M., Di Nuovo, E., Tinnirello, I., La Cascia, M. (2022). Fake News Spreaders Detection: Sometimes Attention Is Not All You Need. INFORMATION, 13(9) [10.3390/info13090426].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
information-13-00426.pdf accesso aperto Tipologia: Versione Editoriale Dimensione 2.95 MB Formato Adobe PDF Visualizza/Apri	2.95 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/568262

Citazioni

ND

34

24

social impact