Predictive models based on radiomics and machine-learning (ML) need large and annotated datasets for training, often difficult to collect. We designed an operative pipeline for model training to exploit data already available to the scientific community. The aim of this work was to explore the capability of radiomic features in predicting tumor histology and stage in patients with non-small cell lung cancer (NSCLC). We analyzed the radiotherapy planning thoracic CT scans of a proprietary sample of 47 subjects (L-RT) and integrated this dataset with a publicly available set of 130 patients from the MAASTRO NSCLC collection (Lung1). We implemented intra- and inter-sample cross-validation strategies (CV) for evaluating the ML predictive model performances with not so large datasets. We carried out two classification tasks: histology classification (3 classes) and overall stage classification (two classes: stage I and II). In the first task, the best performance was obtained by a Random Forest classifier, once the analysis has been restricted to stage I and II tumors of the Lung1 and L-RT merged dataset (AUC = 0.72 ± 0.11). For the overall stage classification, the best results were obtained when training on Lung1 and testing of L-RT dataset (AUC = 0.72 ± 0.04 for Random Forest and AUC = 0.84 ± 0.03 for linear-kernel Support Vector Machine). According to the classification task to be accomplished and to the heterogeneity of the available dataset(s), different CV strategies have to be explored and compared to make a robust assessment of the potential of a predictive model based on radiomics and ML.

Ubaldi L., Valenti V., Borgese R.F., Collura G., Fantacci M.E., Ferrera G., et al. (2021). Strategies to develop radiomics and machine learning models for lung cancer stage and histology prediction using small data samples. PHYSICA MEDICA, 90, 13-22 [10.1016/j.ejmp.2021.08.015].

Strategies to develop radiomics and machine learning models for lung cancer stage and histology prediction using small data samples

Borgese R. F.
Membro del Collaboration Group
;
Ferrera G.;Abbate B. F.
Membro del Collaboration Group
;
Marrale M.
2021-09-01

Abstract

Predictive models based on radiomics and machine-learning (ML) need large and annotated datasets for training, often difficult to collect. We designed an operative pipeline for model training to exploit data already available to the scientific community. The aim of this work was to explore the capability of radiomic features in predicting tumor histology and stage in patients with non-small cell lung cancer (NSCLC). We analyzed the radiotherapy planning thoracic CT scans of a proprietary sample of 47 subjects (L-RT) and integrated this dataset with a publicly available set of 130 patients from the MAASTRO NSCLC collection (Lung1). We implemented intra- and inter-sample cross-validation strategies (CV) for evaluating the ML predictive model performances with not so large datasets. We carried out two classification tasks: histology classification (3 classes) and overall stage classification (two classes: stage I and II). In the first task, the best performance was obtained by a Random Forest classifier, once the analysis has been restricted to stage I and II tumors of the Lung1 and L-RT merged dataset (AUC = 0.72 ± 0.11). For the overall stage classification, the best results were obtained when training on Lung1 and testing of L-RT dataset (AUC = 0.72 ± 0.04 for Random Forest and AUC = 0.84 ± 0.03 for linear-kernel Support Vector Machine). According to the classification task to be accomplished and to the heterogeneity of the available dataset(s), different CV strategies have to be explored and compared to make a robust assessment of the potential of a predictive model based on radiomics and ML.
set-2021
Ubaldi L., Valenti V., Borgese R.F., Collura G., Fantacci M.E., Ferrera G., et al. (2021). Strategies to develop radiomics and machine learning models for lung cancer stage and histology prediction using small data samples. PHYSICA MEDICA, 90, 13-22 [10.1016/j.ejmp.2021.08.015].
File in questo prodotto:
File Dimensione Formato  
Physca Medica 2021.pdf

accesso aperto

Descrizione: Articolo
Tipologia: Versione Editoriale
Dimensione 3.81 MB
Formato Adobe PDF
3.81 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/522458
Citazioni
  • ???jsp.display-item.citation.pmc??? 17
  • Scopus 37
  • ???jsp.display-item.citation.isi??? 36
social impact