Old Church Slavonic (OCS) is an ancient language, and it has unique challenges and hurdles in natural language processing. Currently, there is a lack of Python libraries devised for the analysis of OCS texts. This research is not just filling the crucial gap in the computational treatment of OCS language but also producing valuable resources for scholars in historical linguistics, cultural studies, and humanities for the development of further research in the field of ancient language processing. The main contribution of this research work is the development of an algorithm for the lemmatization of OCS texts based on a learned dictionary. The approach can deal with ancient languages without the need for prior linguistic knowledge. Preparing a dataset of more than 330K words of OCS and their corresponding lemmas, this approach integrates the algorithm and dictionary efficiently to achieve accurate lemmatization on test data.

Nawaz, U., Lo Presti, L., Napolitano, M., La Cascia, M. (2024). Automatic Lemmatization of Old Church Slavonic Language Using A Novel Dictionary-Based Approach. In G. Sfikas, G. Retsinas (a cura di), Document Analysis Systems 16th IAPR International Workshop, DAS 2024, Athens, Greece, August 30–31, 2024, Proceedings (pp. 408-421) [10.1007/978-3-031-70442-0_25].

Automatic Lemmatization of Old Church Slavonic Language Using A Novel Dictionary-Based Approach

Nawaz, Usman
;
Lo Presti, Liliana;La Cascia, Marco
2024-09-11

Abstract

Old Church Slavonic (OCS) is an ancient language, and it has unique challenges and hurdles in natural language processing. Currently, there is a lack of Python libraries devised for the analysis of OCS texts. This research is not just filling the crucial gap in the computational treatment of OCS language but also producing valuable resources for scholars in historical linguistics, cultural studies, and humanities for the development of further research in the field of ancient language processing. The main contribution of this research work is the development of an algorithm for the lemmatization of OCS texts based on a learned dictionary. The approach can deal with ancient languages without the need for prior linguistic knowledge. Preparing a dataset of more than 330K words of OCS and their corresponding lemmas, this approach integrates the algorithm and dictionary efficiently to achieve accurate lemmatization on test data.
11-set-2024
978-3-031-70441-3
978-3-031-70442-0
Nawaz, U., Lo Presti, L., Napolitano, M., La Cascia, M. (2024). Automatic Lemmatization of Old Church Slavonic Language Using A Novel Dictionary-Based Approach. In G. Sfikas, G. Retsinas (a cura di), Document Analysis Systems 16th IAPR International Workshop, DAS 2024, Athens, Greece, August 30–31, 2024, Proceedings (pp. 408-421) [10.1007/978-3-031-70442-0_25].
File in questo prodotto:
File Dimensione Formato  
978-3-031-70442-0_25.pdf

Solo gestori archvio

Descrizione: Articolo
Tipologia: Versione Editoriale
Dimensione 1.57 MB
Formato Adobe PDF
1.57 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/653954
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact