Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

In this paper we introduce UniQA, a high-quality Question-Answering data set that comprehends more than 1k documents and nearly 14k QA pairs. UniQA has been generated in a semi-automated manner using the data retrieved from the website of the University of Palermo, covering information about the bachelor and master degree courses for the academic year 2024/2025. Data are both in Italian and English, thus making the data set suitable for QA and translation models. To assess the data, we propose a Retrieval Augmented Generation model based on Llama-3.1-instruct. UniQA can be found at https://github.com/CHILab1/UniQA.

Irene Siragusa, Roberto Pirrone (2024). UniQA: an Italian and English Question-Answering Data Set Based on Educational Documents. In G. Bonetta, C.D. Hromei, l. Siciliani, M.A. Stranisci (a cura di), CEUR Workshop Proceedings. CEUR-WS.

UniQA: an Italian and English Question-Answering Data Set Based on Educational Documents

Irene Siragusa^Primo;Roberto Pirrone^Secondo

2024-12-01

Abstract

In this paper we introduce UniQA, a high-quality Question-Answering data set that comprehends more than 1k documents and nearly 14k QA pairs. UniQA has been generated in a semi-automated manner using the data retrieved from the website of the University of Palermo, covering information about the bachelor and master degree courses for the academic year 2024/2025. Data are both in Italian and English, thus making the data set suitable for QA and translation models. To assess the data, we propose a Retrieval Augmented Generation model based on Llama-3.1-instruct. UniQA can be found at https://github.com/CHILab1/UniQA.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				dic-2024
			
	Settore scientifico disciplinare del contributo
	
				Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	URL dell'editore (Open access ove possibile)
	
				https://ceur-ws.org/Vol-3877/paper16.pdf
			
	Citazione
	
				Irene Siragusa,  Roberto Pirrone (2024). UniQA: an Italian and English Question-Answering Data Set Based on Educational Documents. In G. Bonetta, C.D. Hromei, l. Siciliani, M.A. Stranisci (a cura di), CEUR Workshop Proceedings. CEUR-WS.
			
	Appare nelle tipologie:
	
				2.07 Contributo in atti di convegno pubblicato in volume

File in questo prodotto:

File	Dimensione	Formato
UniQA.pdf accesso aperto Tipologia: Versione Editoriale Dimensione 1.92 MB Formato Adobe PDF Visualizza/Apri	1.92 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/678364

Citazioni

ND

0

ND

social impact