In this work, we present PARSAL, a retrieval pipeline to obtain relevant scientific articles in a standardized format, given some relevant keyword. The pipeline exploits the API of scientific publishers to retrieve relevant full-text articles in PDF, JSON, or XML format. Afterwards, a parser was implemented to standardize the retrieved articles in a unique format, thus they can be inserted in a Mongo DB database and accessed via a custom GUI. In addition, papers are arranged in a Knowledge Graph, built via LLamaIndex framework, to allow users to make queries to the collected articles and obtain a verbose answer. The code of the developed pipeline, GUI and Knowledge Graph creation and inference is available on GitHub.

Contino, S., Siragusa, I., Sciortino, G., Pirrone, R. (2026). PARSAL: Pipeline for Automatic Retrieval and Structuring of Academic Literature. In Bena N, M. Ceci, R. Esposito, R. Torlone, A. Della Bruna, C.A. Ardagna, et al. (a cura di), CEUR Workshop Proceedings. CEUR-WS.

PARSAL: Pipeline for Automatic Retrieval and Structuring of Academic Literature

Salvatore Contino
Co-primo
;
Irene Siragusa
Co-primo
;
Roberto Pirrone
Ultimo
2026-01-16

Abstract

In this work, we present PARSAL, a retrieval pipeline to obtain relevant scientific articles in a standardized format, given some relevant keyword. The pipeline exploits the API of scientific publishers to retrieve relevant full-text articles in PDF, JSON, or XML format. Afterwards, a parser was implemented to standardize the retrieved articles in a unique format, thus they can be inserted in a Mongo DB database and accessed via a custom GUI. In addition, papers are arranged in a Knowledge Graph, built via LLamaIndex framework, to allow users to make queries to the collected articles and obtain a verbose answer. The code of the developed pipeline, GUI and Knowledge Graph creation and inference is available on GitHub.
16-gen-2026
Contino, S., Siragusa, I., Sciortino, G., Pirrone, R. (2026). PARSAL: Pipeline for Automatic Retrieval and Structuring of Academic Literature. In Bena N, M. Ceci, R. Esposito, R. Torlone, A. Della Bruna, C.A. Ardagna, et al. (a cura di), CEUR Workshop Proceedings. CEUR-WS.
File in questo prodotto:
File Dimensione Formato  
parsal.pdf

Solo gestori archvio

Tipologia: Versione Editoriale
Dimensione 6.42 MB
Formato Adobe PDF
6.42 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/703118
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact