In this work we propose an application of XLNet to address the task hosted at PAN@CLEF2023 related to Profiling Cryptocurrency Influencers with Few-shot Learning. For our proposed approach we made use of XLNet fine-tuned on an augmented version of the training set provided for the competition. Given the few-shot learning perspective of the task we found useful to employ a data augmentation strategy similar to one proposed in a previous edition of a PAN task. The augmentation is performed augmenting each sample in the training dataset with its corresponding backtranslated version to a target language. The target languages we used for our two submissions were German and Italian. After the fine-tuning of the XLNet we predict the labels for the unlabeled test set. After fine-tuning the XLNet model we evaluated it on the original non-augmented training set. We evaluated all the F1 with regards to each label, and then we reported the Macro F1 across all the labels provided. Our results prove that on the original training set our approach can obtain a maximum Macro F1 of 0.6937 and a maximum accuracy of 0.6893.

XLNet with Data Augmentation to Profile Cryptocurrency Influencers

Siino M.
Primo
;
2023-01-01

Abstract

In this work we propose an application of XLNet to address the task hosted at PAN@CLEF2023 related to Profiling Cryptocurrency Influencers with Few-shot Learning. For our proposed approach we made use of XLNet fine-tuned on an augmented version of the training set provided for the competition. Given the few-shot learning perspective of the task we found useful to employ a data augmentation strategy similar to one proposed in a previous edition of a PAN task. The augmentation is performed augmenting each sample in the training dataset with its corresponding backtranslated version to a target language. The target languages we used for our two submissions were German and Italian. After the fine-tuning of the XLNet we predict the labels for the unlabeled test set. After fine-tuning the XLNet model we evaluated it on the original non-augmented training set. We evaluated all the F1 with regards to each label, and then we reported the Macro F1 across all the labels provided. Our results prove that on the original training set our approach can obtain a maximum Macro F1 of 0.6937 and a maximum accuracy of 0.6893.
2023
author profiling
cryptocurrency influencers
few-shot learning
text classification
text data augmentation
Twitter
xlnet
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11769/607973
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact