Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

Background Artificial intelligence (AI) has become a popular tool for clinical and research use in the medical field. The aim of this study was to evaluate the accuracy and reliability of a generative AI tool on pediatric familial Mediterranean fever (FMF).MethodsFifteen questions repeated thrice on pediatric FMF were prompted to the popular generative AI tool Microsoft Copilot with Chat-GPT 4.0. Nine pediatric rheumatology experts rated response accuracy with a blinded mechanism using a Likert-like scale with values from 1 to 5.ResultsMedian values for overall responses at the initial assessment ranged from 2.00 to 5.00. During the second assessment, median values spanned from 2.00 to 4.00, while for the third assessment, they ranged from 3.00 to 4.00. Intra-rater variability showed poor to moderate agreement (intraclass correlation coefficient range: -0.151 to 0.534). A diminishing level of agreement among experts over time was documented, as highlighted by Krippendorff's alpha coefficient values, ranging from 0.136 (at the first response) to 0.132 (at the second response) to 0.089 (at the third response). Lastly, experts displayed varying levels of trust in AI pre- and post-survey.ConclusionsAI has promising implications in pediatric rheumatology, including early diagnosis and management optimization, but challenges persist due to uncertain information reliability and the lack of expert validation. Our survey revealed considerable inaccuracies and incompleteness in AI-generated responses regarding FMF, with poor intra- and extra-rater reliability. Human validation remains crucial in managing AI-generated medical information.

La Bella, S., Attanasi, M., Porreca, A., Di Ludovico, A., Maggio, M.C., Gallizzi, R., et al. (2024). Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey. PEDIATRIC RHEUMATOLOGY ONLINE JOURNAL, 22(1) [10.1186/s12969-024-01011-0].

Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey

La Bella, Saverio;Attanasi, Marina;Porreca, Annamaria;Di Ludovico, Armando;Maggio, Maria Cristina;Gallizzi, Romina;La Torre, Francesco;Rigante, Donato;Soscia, Francesca;Ardenti Morini, Francesca;Insalaco, Antonella;Natale, Marco Francesco;Chiarelli, Francesco;Simonini, Gabriele;De Benedetti, Fabrizio;Gattorno, Marco;Breda, Luciana

2024-08-23

Abstract

Background Artificial intelligence (AI) has become a popular tool for clinical and research use in the medical field. The aim of this study was to evaluate the accuracy and reliability of a generative AI tool on pediatric familial Mediterranean fever (FMF).MethodsFifteen questions repeated thrice on pediatric FMF were prompted to the popular generative AI tool Microsoft Copilot with Chat-GPT 4.0. Nine pediatric rheumatology experts rated response accuracy with a blinded mechanism using a Likert-like scale with values from 1 to 5.ResultsMedian values for overall responses at the initial assessment ranged from 2.00 to 5.00. During the second assessment, median values spanned from 2.00 to 4.00, while for the third assessment, they ranged from 3.00 to 4.00. Intra-rater variability showed poor to moderate agreement (intraclass correlation coefficient range: -0.151 to 0.534). A diminishing level of agreement among experts over time was documented, as highlighted by Krippendorff's alpha coefficient values, ranging from 0.136 (at the first response) to 0.132 (at the second response) to 0.089 (at the third response). Lastly, experts displayed varying levels of trust in AI pre- and post-survey.ConclusionsAI has promising implications in pediatric rheumatology, including early diagnosis and management optimization, but challenges persist due to uncertain information reliability and the lack of expert validation. Our survey revealed considerable inaccuracies and incompleteness in AI-generated responses regarding FMF, with poor intra- and extra-rater reliability. Human validation remains crucial in managing AI-generated medical information.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				23-ago-2024
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				PEDIATRIC RHEUMATOLOGY ONLINE JOURNAL
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1186/s12969-024-01011-0
			
	URL dell'editore (Open access ove possibile)
	
				https://ped-rheum.biomedcentral.com/articles/10.1186/s12969-024-01011-0
			
	Citazione
	
				La Bella, S., Attanasi, M., Porreca, A., Di Ludovico, A., Maggio, M.C., Gallizzi, R., et al. (2024). Reliability of a generative artificial intelligence tool for pediatric familial Mediterranean fever: insights from a multicentre expert survey. PEDIATRIC RHEUMATOLOGY ONLINE JOURNAL, 22(1) [10.1186/s12969-024-01011-0].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Maggio_et_al_s12969-024-01011-0.pdf accesso aperto Descrizione: Artificial Intelligence Tipologia: Versione Editoriale Dimensione 1.38 MB Formato Adobe PDF Visualizza/Apri	1.38 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/651935

Citazioni

7

12

10

social impact