Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.

Lomonaco F., Siino M., Tesconi M. (2023). Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers. In CEUR Workshop Proceedings (pp. 2708-2716). CEUR-WS.

Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers

Lomonaco F.;Siino M.^Primo;Tesconi M.

2023-01-01

Abstract

From a few-shot learning perspective, we propose a strategy to enrich the latent semantic of the text provided in the dataset provided for the Profiling Cryptocurrency Influencers with Few-shot Learning, the task hosted at PAN@CLEF2023. Our approach is based on data augmentation using the backtranslation forth and back to and from Japanese language. We translate samples in the original training dataset to a target language (i.e. Japanese). Then we translate it back to English. The original sample and the backtranslated one are then merged. Then we fine-tuned two state-of-the-art Transformer models on this augmented version of the training dataset. We evaluate the performance of the two fine-tuned models using the Macro and Micro F1 accordingly to the official metric used for the task. After the fine-tuning phase, ELECTRA and XLNet obtained a Macro F1 of 0.7694 and 0.7872 respectively on the original training set. Our best submission obtained a Macro F1 equal to 0.3851 on the official test set provided.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2023
			
	Citazione
	
				Lomonaco F.,  Siino M.,  Tesconi M. (2023). Text Enrichment with Japanese Language to Profile Cryptocurrency Influencers. In CEUR Workshop Proceedings (pp. 2708-2716). CEUR-WS.
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
Text_Enrichment_with_Japanese_Language_to_Profile_Cryptocurrency_Influencers.pdf accesso aperto Tipologia: Versione Editoriale Dimensione 705.7 kB Formato Adobe PDF Visualizza/Apri	705.7 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/621075

Citazioni

ND

ND

ND

social impact