Automatic Lemmatization of Old Church Slavonic Language Using A Novel Dictionary-Based Approach

Nawaz, U.; Lo Presti, L.; Napolitano, M.; La Cascia, M.

doi:10.1007/978-3-031-70442-0_25

Old Church Slavonic (OCS) is an ancient language, and it has unique challenges and hurdles in natural language processing. Currently, there is a lack of Python libraries devised for the analysis of OCS texts. This research is not just filling the crucial gap in the computational treatment of OCS language but also producing valuable resources for scholars in historical linguistics, cultural studies, and humanities for the development of further research in the field of ancient language processing. The main contribution of this research work is the development of an algorithm for the lemmatization of OCS texts based on a learned dictionary. The approach can deal with ancient languages without the need for prior linguistic knowledge. Preparing a dataset of more than 330K words of OCS and their corresponding lemmas, this approach integrates the algorithm and dictionary efficiently to achieve accurate lemmatization on test data.

Nawaz, U., Lo Presti, L., Napolitano, M., La Cascia, M. (2024). Automatic Lemmatization of Old Church Slavonic Language Using A Novel Dictionary-Based Approach. In G. Sfikas, G. Retsinas (a cura di), Document Analysis Systems 16th IAPR International Workshop, DAS 2024, Athens, Greece, August 30–31, 2024, Proceedings (pp. 408-421) [10.1007/978-3-031-70442-0_25].

Automatic Lemmatization of Old Church Slavonic Language Using A Novel Dictionary-Based Approach

Nawaz, Usman;Lo Presti, Liliana;Napolitano, Marianna;La Cascia, Marco

2024-09-11

Abstract

Old Church Slavonic (OCS) is an ancient language, and it has unique challenges and hurdles in natural language processing. Currently, there is a lack of Python libraries devised for the analysis of OCS texts. This research is not just filling the crucial gap in the computational treatment of OCS language but also producing valuable resources for scholars in historical linguistics, cultural studies, and humanities for the development of further research in the field of ancient language processing. The main contribution of this research work is the development of an algorithm for the lemmatization of OCS texts based on a learned dictionary. The approach can deal with ancient languages without the need for prior linguistic knowledge. Preparing a dataset of more than 330K words of OCS and their corresponding lemmas, this approach integrates the algorithm and dictionary efficiently to achieve accurate lemmatization on test data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				11-set-2024
			
	ISBN della monografia 
DATO PREVISTO SU LOGINMIUR
	
				978-3-031-70441-3
978-3-031-70442-0
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1007/978-3-031-70442-0_25
			
	Citazione
	
				Nawaz, U., Lo Presti, L., Napolitano, M., La Cascia, M. (2024). Automatic Lemmatization of Old Church Slavonic Language Using A Novel Dictionary-Based Approach. In G. Sfikas, G. Retsinas (a cura di), Document Analysis Systems
16th IAPR International Workshop, DAS 2024, Athens, Greece, August 30–31, 2024, Proceedings (pp. 408-421) [10.1007/978-3-031-70442-0_25].
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
978-3-031-70442-0_25.pdf Solo gestori archvio Descrizione: Articolo Tipologia: Versione Editoriale Dimensione 1.57 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.57 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/653954

Citazioni

ND

0

0

Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Automatic Lemmatization of Old Church Slavonic Language Using A Novel Dictionary-Based Approach

Nawaz, Usman;Lo Presti, Liliana;Napolitano, Marianna;La Cascia, Marco

2024-09-11

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Automatic Lemmatization of Old Church Slavonic Language Using A Novel Dictionary-Based Approach

Nawaz, Usman;Lo Presti, Liliana;Napolitano, Marianna;La Cascia, Marco

2024-09-11

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)