Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Sanskrit is a highly composite language, morphologically and phonetically complex. One of the major challenges in processing Sanskrit is the splitting of compound words that are merged phonetically. Recognizing the exact location of splits in a compound word is difficult since several possible splits can be found, but only a few of them are semantically meaningful. This paper proposes a novel deep learning method that uses two bi-encoders and a multi-head attention module to predict the valid split location in Sanskrit compound words. The two bi-encoders process the input sequence in direct and reverse order respectively. The model learns the character-level context in which the splitting occurs by exploiting the correlation between the direct and reverse dynamics of the characters sequence. The results of the proposed model are compared with a stateof-the-art technique that adopts a bidirectional recurrent network to solve the same task. Experimental results show that the proposed model correctly identifies where the compound word should be split into its components in 89.27% of cases, outperforming the state-of-the-art technique. The paper also proposes a dataset developed from the repository of the Digital Corpus of Sanskrit (DCS) and the University of Hyderabad (UoH) corpus.

Ali I., Lo Presti L., Spano' I., La Cascia M. (2025). ABBIE: Attention-Based BI-Encoders for Predicting Where to Split Compound Sanskrit Words. In International Conference on Agents and Artificial Intelligence (pp. 334-344). Science and Technology Publications, Lda [10.5220/0013155300003890].

ABBIE: Attention-Based BI-Encoders for Predicting Where to Split Compound Sanskrit Words

Ali I.;Lo Presti L.;Spano' I.;La Cascia M.

2025-01-01

Abstract

Sanskrit is a highly composite language, morphologically and phonetically complex. One of the major challenges in processing Sanskrit is the splitting of compound words that are merged phonetically. Recognizing the exact location of splits in a compound word is difficult since several possible splits can be found, but only a few of them are semantically meaningful. This paper proposes a novel deep learning method that uses two bi-encoders and a multi-head attention module to predict the valid split location in Sanskrit compound words. The two bi-encoders process the input sequence in direct and reverse order respectively. The model learns the character-level context in which the splitting occurs by exploiting the correlation between the direct and reverse dynamics of the characters sequence. The results of the proposed model are compared with a stateof-the-art technique that adopts a bidirectional recurrent network to solve the same task. Experimental results show that the proposed model correctly identifies where the compound word should be split into its components in 89.27% of cases, outperforming the state-of-the-art technique. The paper also proposes a dataset developed from the repository of the Digital Corpus of Sanskrit (DCS) and the University of Hyderabad (UoH) corpus.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2025
			
	Settore scientifico disciplinare del contributo
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.5220/0013155300003890
			
	URL dell'editore (Open access ove possibile)
	
				https://www.scitepress.org/Link.aspx?doi=10.5220/0013155300003890
			
	Citazione
	
				Ali I.,  Lo Presti L.,  Spano' I.,  La Cascia M. (2025). ABBIE: Attention-Based BI-Encoders for Predicting Where to Split Compound Sanskrit Words. In International Conference on Agents and Artificial Intelligence (pp. 334-344). Science and Technology Publications, Lda [10.5220/0013155300003890].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
ABBIE.pdf Solo gestori archvio Descrizione: Il testo pieno dell’articolo è disponibile al seguente link: https://www.scitepress.org/Link.aspx?doi=10.5220/0013155300003890 Tipologia: Versione Editoriale Dimensione 1.24 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.24 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/678963

Citazioni

ND

1

ND

social impact