Understanding comorbidity patterns is crucial for improving patient outcomes and optimising healthcare strategies. In this study, we propose an approach to detect comorbidities of two diseases from clin-ical discharge notes. To account for the complexities of textual data, we summarise the information through propensity scores, which repre-sent the probability of receiving a certain diagnosis conditional on the extracted text. These scores are then used as covariates in a logistic regression model to explore the association between diseases. Specifi-cally, we compare models trained on TF-IDF weighted document-term matrices and text embeddings, employing LASSO regression, XGBoost, and multilayer perceptrons (MLP). Our results, obtained by applying this method to study the association between diabetes and Chronic Kid-ney Disease, demonstrate the potential of Natural Language Process-ing (NLP) and machine learning techniques in advancing observational healthcare research.

Alessandro Albano, C.D.M. (2025). Investigating Comorbidities from Clinical Texts: A Propensity Score Approach. In V.G. Enrico di Bella (a cura di), Statistics for Innovation I, SIS 2025, Short Papers, Plenary, Specialized, and Solicited Sessions [10.1007/978-3-031-96736-8].

Investigating Comorbidities from Clinical Texts: A Propensity Score Approach

Alessandro Albano
;
Chiara Di Maria;Mariangela Sciandra;Antonella Plaia
2025-01-01

Abstract

Understanding comorbidity patterns is crucial for improving patient outcomes and optimising healthcare strategies. In this study, we propose an approach to detect comorbidities of two diseases from clin-ical discharge notes. To account for the complexities of textual data, we summarise the information through propensity scores, which repre-sent the probability of receiving a certain diagnosis conditional on the extracted text. These scores are then used as covariates in a logistic regression model to explore the association between diseases. Specifi-cally, we compare models trained on TF-IDF weighted document-term matrices and text embeddings, employing LASSO regression, XGBoost, and multilayer perceptrons (MLP). Our results, obtained by applying this method to study the association between diabetes and Chronic Kid-ney Disease, demonstrate the potential of Natural Language Process-ing (NLP) and machine learning techniques in advancing observational healthcare research.
2025
Settore STAT-01/A - Statistica
9783031967351
9783031967368
Alessandro Albano, C.D.M. (2025). Investigating Comorbidities from Clinical Texts: A Propensity Score Approach. In V.G. Enrico di Bella (a cura di), Statistics for Innovation I, SIS 2025, Short Papers, Plenary, Specialized, and Solicited Sessions [10.1007/978-3-031-96736-8].
File in questo prodotto:
File Dimensione Formato  
Albano_et_al_SIS2025.pdf

accesso aperto

Descrizione: Paper
Tipologia: Versione Editoriale
Dimensione 8.9 MB
Formato Adobe PDF
8.9 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/683689
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact