In this paper we introduce UniQA, a high-quality Question-Answering data set that comprehends more than 1k documents and nearly 14k QA pairs. UniQA has been generated in a semi-automated manner using the data retrieved from the website of the University of Palermo, covering information about the bachelor and master degree courses for the academic year 2024/2025. Data are both in Italian and English, thus making the data set suitable for QA and translation models. To assess the data, we propose a Retrieval Augmented Generation model based on Llama-3.1-instruct. UniQA can be found at https://github.com/CHILab1/UniQA.
Irene Siragusa, Roberto Pirrone (2024). UniQA: an Italian and English Question-Answering Data Set Based on Educational Documents. In G. Bonetta, C.D. Hromei, l. Siciliani, M.A. Stranisci (a cura di), CEUR Workshop Proceedings. CEUR-WS.
UniQA: an Italian and English Question-Answering Data Set Based on Educational Documents
Irene Siragusa
Primo
;Roberto PirroneSecondo
2024-12-01
Abstract
In this paper we introduce UniQA, a high-quality Question-Answering data set that comprehends more than 1k documents and nearly 14k QA pairs. UniQA has been generated in a semi-automated manner using the data retrieved from the website of the University of Palermo, covering information about the bachelor and master degree courses for the academic year 2024/2025. Data are both in Italian and English, thus making the data set suitable for QA and translation models. To assess the data, we propose a Retrieval Augmented Generation model based on Llama-3.1-instruct. UniQA can be found at https://github.com/CHILab1/UniQA.File | Dimensione | Formato | |
---|---|---|---|
UniQA.pdf
accesso aperto
Tipologia:
Versione Editoriale
Dimensione
1.92 MB
Formato
Adobe PDF
|
1.92 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.