Understanding comorbidity patterns is crucial for improving patient outcomes and optimising healthcare strategies. In this study, we propose an approach to detect comorbidities of two diseases from clin-ical discharge notes. To account for the complexities of textual data, we summarise the information through propensity scores, which repre-sent the probability of receiving a certain diagnosis conditional on the extracted text. These scores are then used as covariates in a logistic regression model to explore the association between diseases. Specifi-cally, we compare models trained on TF-IDF weighted document-term matrices and text embeddings, employing LASSO regression, XGBoost, and multilayer perceptrons (MLP). Our results, obtained by applying this method to study the association between diabetes and Chronic Kid-ney Disease, demonstrate the potential of Natural Language Process-ing (NLP) and machine learning techniques in advancing observational healthcare research.
Alessandro Albano, C.D.M. (2025). Investigating Comorbidities from Clinical Texts: A Propensity Score Approach. In V.G. Enrico di Bella (a cura di), Statistics for Innovation I, SIS 2025, Short Papers, Plenary, Specialized, and Solicited Sessions [10.1007/978-3-031-96736-8].
Investigating Comorbidities from Clinical Texts: A Propensity Score Approach
Alessandro Albano
;Chiara Di Maria;Mariangela Sciandra;Antonella Plaia
2025-01-01
Abstract
Understanding comorbidity patterns is crucial for improving patient outcomes and optimising healthcare strategies. In this study, we propose an approach to detect comorbidities of two diseases from clin-ical discharge notes. To account for the complexities of textual data, we summarise the information through propensity scores, which repre-sent the probability of receiving a certain diagnosis conditional on the extracted text. These scores are then used as covariates in a logistic regression model to explore the association between diseases. Specifi-cally, we compare models trained on TF-IDF weighted document-term matrices and text embeddings, employing LASSO regression, XGBoost, and multilayer perceptrons (MLP). Our results, obtained by applying this method to study the association between diabetes and Chronic Kid-ney Disease, demonstrate the potential of Natural Language Process-ing (NLP) and machine learning techniques in advancing observational healthcare research.| File | Dimensione | Formato | |
|---|---|---|---|
|
Albano_et_al_SIS2025.pdf
accesso aperto
Descrizione: Paper
Tipologia:
Versione Editoriale
Dimensione
8.9 MB
Formato
Adobe PDF
|
8.9 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


