This paper illustrates the implementation of Open Unipa-GPT, an open-source version of the Unipa-GPT chat-bot that leverages open-source Large Language Models for embeddings and text generation. The system relies on a Retrieval Augmented Generation approach, thus mitigating hallucination errors in the generation phase. A detailed comparison between different models is reported to illustrate their performance as regards embedding generation, retrieval, and text generation. In the last case, models were tested in a simple inference setup after a fine-tuning procedure. Experiments demonstrate that an open-source LLMs can be efficiently used for embedding generation, but none of the models does reach the performances obtained by closed models, such as gpt-3.5-turbo in generating answers. Corpora and code are available on GitHub

Siragusa, I., Pirrone, R. (2024). Unipa-GPT: a framework to assess open-source alternatives to Chat-GPT for Italian chat-bots. In F. Dell'Orletta, A. Lenci, S. Montemagni, R. Sprugnoli (a cura di), CEUR Workshop Proceedings. CEUR-WS.

Unipa-GPT: a framework to assess open-source alternatives to Chat-GPT for Italian chat-bots

Irene Siragusa
Primo
;
Roberto Pirrone
Secondo
2024-12-01

Abstract

This paper illustrates the implementation of Open Unipa-GPT, an open-source version of the Unipa-GPT chat-bot that leverages open-source Large Language Models for embeddings and text generation. The system relies on a Retrieval Augmented Generation approach, thus mitigating hallucination errors in the generation phase. A detailed comparison between different models is reported to illustrate their performance as regards embedding generation, retrieval, and text generation. In the last case, models were tested in a simple inference setup after a fine-tuning procedure. Experiments demonstrate that an open-source LLMs can be efficiently used for embedding generation, but none of the models does reach the performances obtained by closed models, such as gpt-3.5-turbo in generating answers. Corpora and code are available on GitHub
dic-2024
Siragusa, I., Pirrone, R. (2024). Unipa-GPT: a framework to assess open-source alternatives to Chat-GPT for Italian chat-bots. In F. Dell'Orletta, A. Lenci, S. Montemagni, R. Sprugnoli (a cura di), CEUR Workshop Proceedings. CEUR-WS.
File in questo prodotto:
File Dimensione Formato  
CLiC_it_unipa_gpt_open-v3.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 1.61 MB
Formato Adobe PDF
1.61 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/703119
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact