Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

We propose a first step toward multilingual end-to-end automatic speech recognition (ASR) by integrating knowledge about speech articulators. The key idea is to leverage a rich set of fundamental units that can be defined "universally" across all spoken languages, referred to as speech attributes, namely manner and place of articulation. Specifically, several deterministic attribute-to-phoneme mapping matrices are constructed based on the predefined set of universal attribute inventory, which projects the knowledge-rich articulatory attribute logits, into output phoneme logits. The mapping puts knowledge-based constraints to limit inconsistency with acoustic-phonetic evidence in the integrated prediction. Combined with phoneme recognition, our phone recognizer is able to infer from both attribute and phoneme information. The proposed joint multilingual model is evaluated through phoneme recognition. In multilingual experiments over 6 languages on benchmark datasets LibriSpeech and CommonVoice, we find that our proposed solution outperforms conventional multilingual approaches with a relative improvement of 6.85% on average, and it also demonstrates a much better performance compared to monolingual model. Further analysis conclusively demonstrates that the proposed solution eliminates phoneme predictions that are inconsistent with attributes.

Yen, H., Siniscalchi, S.M., Lee, C. (2024). Boosting End-to-End Multilingual Phoneme Recognition Through Exploiting Universal Speech Attributes Constraints. In IEEE ICASSP [10.1109/icassp48485.2024.10447568].

Boosting End-to-End Multilingual Phoneme Recognition Through Exploiting Universal Speech Attributes Constraints

Siniscalchi, Sabato Marco^{Secondo

Methodology};

2024-01-01

Abstract

We propose a first step toward multilingual end-to-end automatic speech recognition (ASR) by integrating knowledge about speech articulators. The key idea is to leverage a rich set of fundamental units that can be defined "universally" across all spoken languages, referred to as speech attributes, namely manner and place of articulation. Specifically, several deterministic attribute-to-phoneme mapping matrices are constructed based on the predefined set of universal attribute inventory, which projects the knowledge-rich articulatory attribute logits, into output phoneme logits. The mapping puts knowledge-based constraints to limit inconsistency with acoustic-phonetic evidence in the integrated prediction. Combined with phoneme recognition, our phone recognizer is able to infer from both attribute and phoneme information. The proposed joint multilingual model is evaluated through phoneme recognition. In multilingual experiments over 6 languages on benchmark datasets LibriSpeech and CommonVoice, we find that our proposed solution outperforms conventional multilingual approaches with a relative improvement of 6.85% on average, and it also demonstrates a much better performance compared to monolingual model. Further analysis conclusively demonstrates that the proposed solution eliminates phoneme predictions that are inconsistent with attributes.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2024
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				979-8-3503-4485-1
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/icassp48485.2024.10447568
			
	URL dell'editore (Open access ove possibile)
	
				https://ieeexplore.ieee.org/document/10447568
			
	Citazione
	
				Yen, H., Siniscalchi, S.M., Lee, C. (2024). Boosting End-to-End Multilingual Phoneme Recognition Through Exploiting Universal Speech Attributes Constraints. In IEEE ICASSP [10.1109/icassp48485.2024.10447568].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
Boosting_End-to-End_Multilingual_Phoneme_Recognition_Through_Exploiting_Universal_Speech_Attributes_Constraints.pdf Solo gestori archvio Descrizione: Main document Tipologia: Versione Editoriale Dimensione 977.98 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	977.98 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/638754

Citazioni

ND

3

ND

social impact