Archivio istituzionale della ricerca dell'Università degli Studi di Palermo

This study introduces kCMI-FS, a feature selection (FS) method that leverages Conditional Mutual Information (CMI) estimated via an adapted k-nearest neighbour (kNN) strategy to handle mixed-type data with continuous features and discrete targets. Unlike traditional approaches, based on Mutual Information, that may overlook redundancy or higher-order dependencies, kCMI-FS incorporates a significance-based forward selection process to identify informative and non-redundant features. We assess its performance on theoretical simulations, five synthetic datasets, and four biomedical benchmark datasets that highlight key FS challenges. Results demonstrate that kCMI-FS consistently recovers relevant features in structured scenarios and matches or outperforms existing methods, particularly in mixed-variable and high-dimensional conditions, even if in some cases at the price of a few more redundant/irrelevant features selected. Furthermore, classification experiments carried out on the biomedical datasets confirm that kCMI-FS offers strong predictive performance with reduced feature sets, thus enhancing model interpretability without compromising accuracy compared to existing methods. The results highlight the potential relevance of kCMI-FS in biomedical data analysis, particularly in classification problems where interpretability, feature compactness, and robustness are essential for supporting early diagnosis and clinical decision-making.

Iovino, M., Lazic, I., Barà, C., Kugiumtzis, D., Faes, L., Pernice, R. (2026). A Principled and Data-efficient Information-theoretic Method for Feature Selection. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 1-13 [10.1109/jbhi.2026.3679377].

A Principled and Data-efficient Information-theoretic Method for Feature Selection

Iovino, Marta;Lazic, Ivan;Barà, Chiara;Kugiumtzis, Dimitris;Faes, Luca;Pernice, Riccardo

2026-03-31

Abstract

This study introduces kCMI-FS, a feature selection (FS) method that leverages Conditional Mutual Information (CMI) estimated via an adapted k-nearest neighbour (kNN) strategy to handle mixed-type data with continuous features and discrete targets. Unlike traditional approaches, based on Mutual Information, that may overlook redundancy or higher-order dependencies, kCMI-FS incorporates a significance-based forward selection process to identify informative and non-redundant features. We assess its performance on theoretical simulations, five synthetic datasets, and four biomedical benchmark datasets that highlight key FS challenges. Results demonstrate that kCMI-FS consistently recovers relevant features in structured scenarios and matches or outperforms existing methods, particularly in mixed-variable and high-dimensional conditions, even if in some cases at the price of a few more redundant/irrelevant features selected. Furthermore, classification experiments carried out on the biomedical datasets confirm that kCMI-FS offers strong predictive performance with reduced feature sets, thus enhancing model interpretability without compromising accuracy compared to existing methods. The results highlight the potential relevance of kCMI-FS in biomedical data analysis, particularly in classification problems where interpretability, feature compactness, and robustness are essential for supporting early diagnosis and clinical decision-making.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data
	
				31-mar-2026
			
	Titolo del periodico 
DATO PREVISTO SU LOGINMIUR
	
				IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS
			
	DOI del contributo 
DATO PREVISTO SU LOGINMIUR
	
				https://dx.doi.org/10.1109/jbhi.2026.3679377
			
	Citazione
	
				Iovino, M., Lazic, I., Barà, C., Kugiumtzis, D., Faes, L., Pernice, R. (2026). A Principled and Data-efficient Information-theoretic Method for Feature Selection. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 1-13 [10.1109/jbhi.2026.3679377].
			
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
191-IovinoLazic_IEEEJBHI-2026_preprint.pdf Solo gestori archvio Tipologia: Post-print Dimensione 894.53 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	894.53 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/704243

Citazioni

1

0

ND

social impact