The contribution presents the construction of the REVERINO dataset, consisting of 4,533 pairs of regesta and medieval Latin texts, created through a structured pipeline involving manual annotation, segmentation model training, OCR-based text extraction, and post-processing. The aim is to support the automatic summarization of historical documents, particularly 13th-century papal texts. The study uses the dataset to evaluate the performance of language models (GPT-4 and Llama) in generating regesta, comparing direct and translation-based approaches. The results show promising potential but also significant limitations, especially in accurately identifying key elements such as names, dates, and recipients. The project demonstrates that AI can contribute to the summarization of historical sources, but further improvements in both models and data are needed to ensure reliability and accuracy.
Sabbatini, I., Righi, L., Puccetti, G., Esuli, A. (2025). Automatic Extraction of Regesta for Medieval Latin Text Summarization. ERCIM NEWS(141), 31-32.
Automatic Extraction of Regesta for Medieval Latin Text Summarization
Ilaria Sabbatini
Membro del Collaboration Group
;
2025-01-01
Abstract
The contribution presents the construction of the REVERINO dataset, consisting of 4,533 pairs of regesta and medieval Latin texts, created through a structured pipeline involving manual annotation, segmentation model training, OCR-based text extraction, and post-processing. The aim is to support the automatic summarization of historical documents, particularly 13th-century papal texts. The study uses the dataset to evaluate the performance of language models (GPT-4 and Llama) in generating regesta, comparing direct and translation-based approaches. The results show promising potential but also significant limitations, especially in accurately identifying key elements such as names, dates, and recipients. The project demonstrates that AI can contribute to the summarization of historical sources, but further improvements in both models and data are needed to ensure reliability and accuracy.| File | Dimensione | Formato | |
|---|---|---|---|
|
ERCIM EN141-web.pdf
accesso aperto
Descrizione: Articolo principale
Tipologia:
Versione Editoriale
Dimensione
3.01 MB
Formato
Adobe PDF
|
3.01 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


