Nearest neighbours in least-squares data imputation algorithms with different missing patterns

Ito Wasito, Boris Mirkin

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

Methods for imputation of missing data in the so-called least-squares approximation approach, a non-parametric computationally efficient multidimensional technique, are experimentally compared. Contributions are made to each of the three components of the experiment setting: (a) algorithms to be compared, (b) data generation, and (c) patterns of missing data. Specifically, "global" methods for least-squares data imputation are reviewed and extensions to them are proposed based on the nearest neighbours (NN) approach. A conventional generator of mixtures of Gaussian distributions is theoretically analysed and, then, modified to scale clusters differently. Patterns of missing data are defined in terms of rows and columns according to three different mechanisms that are referred to as Random missings, Restricted random missings, and Merged database. It appears that NN-based versions almost always outperform their global counterparts. With the Random missings pattern, the winner is always the authors' two-stage method INI, which combines global and local imputation algorithms.

Original languageEnglish
Pages (from-to)926-949
Number of pages24
JournalComputational Statistics and Data Analysis
Volume50
Issue number4
DOIs
Publication statusPublished - 24 Feb 2006

Keywords

  • Least squares
  • Merged database missing
  • Missing data
  • Nearest neighbours
  • Random missing
  • Restricted random missing
  • Singular value decomposition

Fingerprint Dive into the research topics of 'Nearest neighbours in least-squares data imputation algorithms with different missing patterns'. Together they form a unique fingerprint.

  • Cite this