From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.

Lomonaco F., Siino M., Tesconi M. (2023). Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers. In CEUR Workshop Proceedings (pp. 2708-2716). CEUR-WS.

Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers

Siino M.
;
2023-01-01

Abstract

From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.
2023
Settore ING-INF/05 - Sistemi Di Elaborazione Delle Informazioni
Settore INF/01 - Informatica
Lomonaco F., Siino M., Tesconi M. (2023). Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers. In CEUR Workshop Proceedings (pp. 2708-2716). CEUR-WS.
File in questo prodotto:
File Dimensione Formato  
Text_Enrichment_with_Japanese_Language_to_Profile_Cryptocurrency_Influencers.pdf

accesso aperto

Tipologia: Versione Editoriale
Dimensione 705.7 kB
Formato Adobe PDF
705.7 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/621075
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact