Missing data represent a general problem in many scientific fields above all in environmental research. Several methods have been proposed in literature for handling missing data and the choice of an appropriate method depends, among others, on the missing data pattern and on the missing-data mechanism. One approach to the problem is to impute them to yield a complete data set. The goal of this paper is to propose a new single imputation method and to compare its performance to other single and multiple imputation methods known in literature. Considering a data set of PM10 concentration measured every by eight monitoring stations distributed over the metropolitan area of Palermo, Sicily, during 2003, simulated incomplete data have been generated, and the performance of the imputation methods have been compared on the correlation coefficient (ρ), the index of agreement (d), the root mean square deviation (RMSD) and the mean absolute deviation (MAD). All the performance indicators agree to evaluate the proposed method as the best among the ones compared, independently on the gap length and on the number of stations with missing data.

PLAIA A, BONDI' A L (2006). Single imputation method of missing values in environmental pollution data sets. ATMOSPHERIC ENVIRONMENT, 40 (38), 7316-7330 [10.1016/j.atmosenv.2006.06.040].

Single imputation method of missing values in environmental pollution data sets

PLAIA, Antonella;BONDI', Anna Lisa
2006-01-01

Abstract

Missing data represent a general problem in many scientific fields above all in environmental research. Several methods have been proposed in literature for handling missing data and the choice of an appropriate method depends, among others, on the missing data pattern and on the missing-data mechanism. One approach to the problem is to impute them to yield a complete data set. The goal of this paper is to propose a new single imputation method and to compare its performance to other single and multiple imputation methods known in literature. Considering a data set of PM10 concentration measured every by eight monitoring stations distributed over the metropolitan area of Palermo, Sicily, during 2003, simulated incomplete data have been generated, and the performance of the imputation methods have been compared on the correlation coefficient (ρ), the index of agreement (d), the root mean square deviation (RMSD) and the mean absolute deviation (MAD). All the performance indicators agree to evaluate the proposed method as the best among the ones compared, independently on the gap length and on the number of stations with missing data.
2006
PLAIA A, BONDI' A L (2006). Single imputation method of missing values in environmental pollution data sets. ATMOSPHERIC ENVIRONMENT, 40 (38), 7316-7330 [10.1016/j.atmosenv.2006.06.040].
File in questo prodotto:
File Dimensione Formato  
articolo_online.pdf

Solo gestori archvio

Dimensione 412.91 kB
Formato Adobe PDF
412.91 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/10447/29440
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 85
  • ???jsp.display-item.citation.isi??? 74
social impact