Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Italy exhibits rich linguistic diversity across its territory due to the distinct regional languages spoken in different areas. Recent advances in self-supervised learning provide new opportunities to analyze Italy's linguistic varieties using speech data alone. This includes the potential to leverage representations learned from large amounts of data to better examine nuances between closely related linguistic varieties. In this study, we focus on automatically identifying the geographic region of origin of speech samples drawn from Italy's diverse language varieties. We leverage self-supervised learning models to tackle this task and analyze differences and similarities between Italy's regional languages. In doing so, we also seek to uncover new insights into the relationships among these diverse yet closely related varieties, which may help linguists understand their interconnected evolution and regional development over time and space. To improve the discriminative ability of learned representations, we evaluate several supervised contrastive learning objectives, both as pre-training steps and additional fine-tuning objectives. Experimental evidence shows that pre-trained self-supervised models can effectively identify regions from speech recording. Additionally, incorporating contrastive objectives during fine-tuning improves classification accuracy and yields embeddings that distinctly separate regional varieties, demonstrating the value of combining self-supervised pre-training and contrastive learning for this task.

Quatra M.L., Koudounas A., Baralis E., Siniscalchi S.M. (2024). Speech Analysis of Language Varieties in Italy. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings (pp. 15147-15159). European Language Resources Association (ELRA).

Speech Analysis of Language Varieties in Italy

Quatra M. L.;Koudounas A.;Baralis E.;Siniscalchi S. M.^{Ultimo

Formal Analysis}

2024-01-01

Abstract

Italy exhibits rich linguistic diversity across its territory due to the distinct regional languages spoken in different areas. Recent advances in self-supervised learning provide new opportunities to analyze Italy's linguistic varieties using speech data alone. This includes the potential to leverage representations learned from large amounts of data to better examine nuances between closely related linguistic varieties. In this study, we focus on automatically identifying the geographic region of origin of speech samples drawn from Italy's diverse language varieties. We leverage self-supervised learning models to tackle this task and analyze differences and similarities between Italy's regional languages. In doing so, we also seek to uncover new insights into the relationships among these diverse yet closely related varieties, which may help linguists understand their interconnected evolution and regional development over time and space. To improve the discriminative ability of learned representations, we evaluate several supervised contrastive learning objectives, both as pre-training steps and additional fine-tuning objectives. Experimental evidence shows that pre-trained self-supervised models can effectively identify regions from speech recording. Additionally, incorporating contrastive objectives during fine-tuning improves classification accuracy and yields embeddings that distinctly separate regional varieties, demonstrating the value of combining self-supervised pre-training and contrastive learning for this task.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2024
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				9782493814104
			
	URL dell'editore (Open access ove possibile)
	
				https://aclanthology.org/2024.lrec-main.1317/
			
	Citazione
	
				Quatra M.L.,  Koudounas A.,  Baralis E.,  Siniscalchi S.M. (2024). Speech Analysis of Language Varieties in Italy. In 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings (pp. 15147-15159). European Language Resources Association (ELRA).
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
2024.lrec-main.1317.pdf accesso aperto Descrizione: main document Tipologia: Versione Editoriale Dimensione 1.1 MB Formato Adobe PDF Visualizza/Apri	1.1 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/649493

Citazioni

ND

7

ND

social impact