This study introduces kCMI-FS, a feature selection (FS) method that leverages Conditional Mutual Information (CMI) estimated via an adapted k-nearest neighbour (kNN) strategy to handle mixed-type data with continuous features and discrete targets. Unlike traditional approaches, based on Mutual Information, that may overlook redundancy or higher-order dependencies, kCMI-FS incorporates a significance-based forward selection process to identify informative and non-redundant features. We assess its performance on theoretical simulations, five synthetic datasets, and four biomedical benchmark datasets that highlight key FS challenges. Results demonstrate that kCMI-FS consistently recovers relevant features in structured scenarios and matches or outperforms existing methods, particularly in mixed-variable and high-dimensional conditions, even if in some cases at the price of a few more redundant/irrelevant features selected. Furthermore, classification experiments carried out on the biomedical datasets confirm that kCMI-FS offers strong predictive performance with reduced feature sets, thus enhancing model interpretability without compromising accuracy compared to existing methods. The results highlight the potential relevance of kCMI-FS in biomedical data analysis, particularly in classification problems where interpretability, feature compactness, and robustness are essential for supporting early diagnosis and clinical decision-making.
Iovino, M., Lazic, I., Barà, C., Kugiumtzis, D., Faes, L., Pernice, R. (2026). A Principled and Data-efficient Information-theoretic Method for Feature Selection. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 1-13 [10.1109/jbhi.2026.3679377].
A Principled and Data-efficient Information-theoretic Method for Feature Selection
Iovino, Marta;Lazic, Ivan;Faes, Luca;Pernice, Riccardo
2026-03-31
Abstract
This study introduces kCMI-FS, a feature selection (FS) method that leverages Conditional Mutual Information (CMI) estimated via an adapted k-nearest neighbour (kNN) strategy to handle mixed-type data with continuous features and discrete targets. Unlike traditional approaches, based on Mutual Information, that may overlook redundancy or higher-order dependencies, kCMI-FS incorporates a significance-based forward selection process to identify informative and non-redundant features. We assess its performance on theoretical simulations, five synthetic datasets, and four biomedical benchmark datasets that highlight key FS challenges. Results demonstrate that kCMI-FS consistently recovers relevant features in structured scenarios and matches or outperforms existing methods, particularly in mixed-variable and high-dimensional conditions, even if in some cases at the price of a few more redundant/irrelevant features selected. Furthermore, classification experiments carried out on the biomedical datasets confirm that kCMI-FS offers strong predictive performance with reduced feature sets, thus enhancing model interpretability without compromising accuracy compared to existing methods. The results highlight the potential relevance of kCMI-FS in biomedical data analysis, particularly in classification problems where interpretability, feature compactness, and robustness are essential for supporting early diagnosis and clinical decision-making.| File | Dimensione | Formato | |
|---|---|---|---|
|
191-IovinoLazic_IEEEJBHI-2026_preprint.pdf
Solo gestori archvio
Tipologia:
Post-print
Dimensione
894.53 kB
Formato
Adobe PDF
|
894.53 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


