The main goal of this Thesis is to describe numerous statistical techniques that deal with high-dimensional genomic data. The Thesis begins with a review of the literature on penalized regression models, with particular attention to least absolute shrinkage and selection operator (LASSO) or L1-penalty methods. L1 logistic/multinomial regression models are used for variable selection and discriminant analysis with a binary/categorical response variable. The Thesis discusses and compares several methods that are commonly utilized in genetics, and introduces new strategies to select markers according to their informative content and to discriminate clusters by offering reduced panels for population genetic analysis. After having accomplished its main objective, the thesis addresses the issue of tuning parameter selection in LASSO models, studying consistency with high-dimensional data. The tuning parameter balances the trade-off between model fit and variance reduction in sparse models and its value is crucial in all the lasso-type regression. Finally, this Thesis introduces a LASSO method that can be applied to quantile regression coefficients modeling (QRCM), an approach that permits describing the coefficients of a quantile regression model as parametric functions of the order of the quantile. Compared with standard quantile regression, QRCM facilitates estimation, inference, and interpretation of the results, and generally yields a gain in efficiency. However, since each predictor has multiple associated coefficients, the total number of parameters escalates quickly with the size of the model matrix, causing numerical problems and large standard errors. Using the L1-penalty in this framework permits keeping a parsimonious set of parameters and performing variable selection in an efficient way.

Sottile, G.Penalized regression and clustering in high-dimensional data.

Penalized regression and clustering in high-dimensional data

Sottile, Gianluca

Abstract

The main goal of this Thesis is to describe numerous statistical techniques that deal with high-dimensional genomic data. The Thesis begins with a review of the literature on penalized regression models, with particular attention to least absolute shrinkage and selection operator (LASSO) or L1-penalty methods. L1 logistic/multinomial regression models are used for variable selection and discriminant analysis with a binary/categorical response variable. The Thesis discusses and compares several methods that are commonly utilized in genetics, and introduces new strategies to select markers according to their informative content and to discriminate clusters by offering reduced panels for population genetic analysis. After having accomplished its main objective, the thesis addresses the issue of tuning parameter selection in LASSO models, studying consistency with high-dimensional data. The tuning parameter balances the trade-off between model fit and variance reduction in sparse models and its value is crucial in all the lasso-type regression. Finally, this Thesis introduces a LASSO method that can be applied to quantile regression coefficients modeling (QRCM), an approach that permits describing the coefficients of a quantile regression model as parametric functions of the order of the quantile. Compared with standard quantile regression, QRCM facilitates estimation, inference, and interpretation of the results, and generally yields a gain in efficiency. However, since each predictor has multiple associated coefficients, the total number of parameters escalates quickly with the size of the model matrix, causing numerical problems and large standard errors. Using the L1-penalty in this framework permits keeping a parsimonious set of parameters and performing variable selection in an efficient way.
Lasso regression; High-dimensional data; Genomic data; Tuning parameter selection; Quantile regression coefficients modeling; Curves clustering;
Sottile, G.Penalized regression and clustering in high-dimensional data.
File in questo prodotto:
File Dimensione Formato  
Tesi Sottile.pdf

accesso aperto

Descrizione: Tesi di dottorato
Dimensione 2.19 MB
Formato Adobe PDF
2.19 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/265274
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact