In this work, we present PARSAL, a retrieval pipeline to obtain relevant scientific articles in a standardized format, given some relevant keyword. The pipeline exploits the API of scientific publishers to retrieve relevant full-text articles in PDF, JSON, or XML format. Afterwards, a parser was implemented to standardize the retrieved articles in a unique format, thus they can be inserted in a Mongo DB database and accessed via a custom GUI. In addition, papers are arranged in a Knowledge Graph, built via LLamaIndex framework, to allow users to make queries to the collected articles and obtain a verbose answer. The code of the developed pipeline, GUI and Knowledge Graph creation and inference is available on GitHub.
Contino, S., Siragusa, I., Sciortino, G., Pirrone, R. (2026). PARSAL: Pipeline for Automatic Retrieval and Structuring of Academic Literature. In Bena N, M. Ceci, R. Esposito, R. Torlone, A. Della Bruna, C.A. Ardagna, et al. (a cura di), CEUR Workshop Proceedings. CEUR-WS.
PARSAL: Pipeline for Automatic Retrieval and Structuring of Academic Literature
Salvatore ContinoCo-primo
;Irene Siragusa
Co-primo
;Roberto PirroneUltimo
2026-01-16
Abstract
In this work, we present PARSAL, a retrieval pipeline to obtain relevant scientific articles in a standardized format, given some relevant keyword. The pipeline exploits the API of scientific publishers to retrieve relevant full-text articles in PDF, JSON, or XML format. Afterwards, a parser was implemented to standardize the retrieved articles in a unique format, thus they can be inserted in a Mongo DB database and accessed via a custom GUI. In addition, papers are arranged in a Knowledge Graph, built via LLamaIndex framework, to allow users to make queries to the collected articles and obtain a verbose answer. The code of the developed pipeline, GUI and Knowledge Graph creation and inference is available on GitHub.| File | Dimensione | Formato | |
|---|---|---|---|
|
parsal.pdf
Solo gestori archvio
Tipologia:
Versione Editoriale
Dimensione
6.42 MB
Formato
Adobe PDF
|
6.42 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


