Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Transformer-based language models like BERT have revolutionized Natural Language Processing (NLP) research, but their application to historical languages remains underexplored. This paper investigates the adaptation of BERT-based embedding models for Latin, a language central to the study of the sacred texts of Christianity. Focusing on Jerome’s Vulgate, pre-Vulgate Latin translations of the Bible, and patristic commentaries such as Augustine’s De Genesi ad litteram, we address the challenges posed by Latin’s complex syntax, specialized vocabulary, and historical variations at the orthographic, morphological, and semantic levels. In particular, we propose fine-tuning existing BERT-based embedding models on annotated Latin corpora, using self-generated hard negatives to improve performance in detecting biblical references in early Christian literature in Latin. Experimental results demonstrate the ability of BERT-based models to identify citations of and allusions to the Bible(s) in ancient Christian commentaries while highlighting the complexities and challenges of this field. By integrating NLP techniques with humanistic expertise, this work provides a case study on intertextual analysis in Latin patristic works. It underscores the transformative potential of interdisciplinary approaches, advancing computational tools for sacred text studies and bridging the gap between philology and computational analysis.

Caffagni, D., Cocchi, F., Mambelli, A., Tutrone, F., Zanella, M., Cornia, M., et al. (2025). Benchmarking BERT-based Models for Latin: A Case Study on Biblical References in Ancient Christian Literature. In Proceedings of the 21st Conference on Information and Research Science Connecting to Digital and Library Science. Udine.

Benchmarking BERT-based Models for Latin: A Case Study on Biblical References in Ancient Christian Literature

Caffagni, Davide;Cocchi, Federico;Mambelli, Anna;Tutrone, Fabio;Zanella, Marco;Cornia, Marcella;

2025-01-01

Abstract

Transformer-based language models like BERT have revolutionized Natural Language Processing (NLP) research, but their application to historical languages remains underexplored. This paper investigates the adaptation of BERT-based embedding models for Latin, a language central to the study of the sacred texts of Christianity. Focusing on Jerome’s Vulgate, pre-Vulgate Latin translations of the Bible, and patristic commentaries such as Augustine’s De Genesi ad litteram, we address the challenges posed by Latin’s complex syntax, specialized vocabulary, and historical variations at the orthographic, morphological, and semantic levels. In particular, we propose fine-tuning existing BERT-based embedding models on annotated Latin corpora, using self-generated hard negatives to improve performance in detecting biblical references in early Christian literature in Latin. Experimental results demonstrate the ability of BERT-based models to identify citations of and allusions to the Bible(s) in ancient Christian commentaries while highlighting the complexities and challenges of this field. By integrating NLP techniques with humanistic expertise, this work provides a case study on intertextual analysis in Latin patristic works. It underscores the transformative potential of interdisciplinary approaches, advancing computational tools for sacred text studies and bridging the gap between philology and computational analysis.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				2025
			
	Settore scientifico disciplinare del contributo
	
				Settore FICP-01/A - Filologia greca e latina
Settore LATI-01/A - Lingua e letteratura latina
Settore FICP-01/B - Letteratura cristiana antica
			
	URL dell'editore (Open access ove possibile)
	
				https://ceur-ws.org/Vol-3937/
			
	Citazione
	
				Caffagni, D., Cocchi, F., Mambelli, A., Tutrone, F., Zanella, M., Cornia, M., et al. (2025). Benchmarking BERT-based Models for Latin: A Case Study on Biblical References in Ancient Christian Literature. In Proceedings of the 21st Conference on Information and Research Science Connecting to Digital and Library Science. Udine.
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
Benchmarking BERT-based_Tutrone.pdf accesso aperto Descrizione: Testo completo dell'articolo Tipologia: Versione Editoriale Dimensione 1.27 MB Formato Adobe PDF Visualizza/Apri	1.27 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/674723

Citazioni

ND

0

ND

social impact