From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.
Lomonaco F., Siino M., Tesconi M. (2023). Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers. In CEUR Workshop Proceedings (pp. 2708-2716). CEUR-WS.
Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers
Siino M.
Primo
;
2023-01-01
Abstract
From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.File | Dimensione | Formato | |
---|---|---|---|
Text_Enrichment_with_Japanese_Language_to_Profile_Cryptocurrency_Influencers.pdf
accesso aperto
Tipologia:
Versione Editoriale
Dimensione
705.7 kB
Formato
Adobe PDF
|
705.7 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.