XLNet with Data Augmentation to Profile Cryptocurrency Influencers

IRIS

In this work we propose an application of XLNet to address the task hosted at PAN@CLEF2023 related to Profiling Cryptocurrency Influencers with Few-shot Learning. For our proposed approach we made use of XLNet fine-tuned on an augmented version of the training set provided for the competition. Given the few-shot learning perspective of the task we found useful to employ a data augmentation strategy similar to one proposed in a previous edition of a PAN task. The augmentation is performed augmenting each sample in the training dataset with its corresponding backtranslated version to a target language. The target languages we used for our two submissions were German and Italian. After the fine-tuning of the XLNet we predict the labels for the unlabeled test set. After fine-tuning the XLNet model we evaluated it on the original non-augmented training set. We evaluated all the F1 with regards to each label, and then we reported the Macro F1 across all the labels provided. Our results prove that on the original training set our approach can obtain a maximum Macro F1 of 0.6937 and a maximum accuracy of 0.6893.

XLNet with Data Augmentation to Profile Cryptocurrency Influencers

Siino M.^Primo;Tinnirello I.^Ultimo

2023-01-01

Abstract

In this work we propose an application of XLNet to address the task hosted at PAN@CLEF2023 related to Profiling Cryptocurrency Influencers with Few-shot Learning. For our proposed approach we made use of XLNet fine-tuned on an augmented version of the training set provided for the competition. Given the few-shot learning perspective of the task we found useful to employ a data augmentation strategy similar to one proposed in a previous edition of a PAN task. The augmentation is performed augmenting each sample in the training dataset with its corresponding backtranslated version to a target language. The target languages we used for our two submissions were German and Italian. After the fine-tuning of the XLNet we predict the labels for the unlabeled test set. After fine-tuning the XLNet model we evaluated it on the original non-augmented training set. We evaluated all the F1 with regards to each label, and then we reported the Macro F1 across all the labels provided. Our results prove that on the original training set our approach can obtain a maximum Macro F1 of 0.6937 and a maximum accuracy of 0.6893.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Parole chiave
	
				author profiling
cryptocurrency influencers
few-shot learning
text classification
text data augmentation
Twitter
xlnet
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
paper-231.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 959.19 kB Formato Adobe PDF Visualizza/Apri	959.19 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/607973

Citazioni

ND

16

ND

social impact