Skip to main content
Knowledge4Policy
Knowledge for policy

Competence Centre on Composite Indicators and Scoreboards

Our expertise on statistical methodologies and in developing sound composite indicators provides policy-makers with the ‘big picture’ for informed policy decisions and progress monitoring.

  • Page | Last updated: 01 Dec 2020

Step 3: Imputation of missing data

After assembling a set of indicators, missing data can be imputed, outliers treated and transformations can be applied to indicators where necessary and appropriate.

The idea of imputation is both seductive and dangerous.

Like most statistical series, composite indicators are plagued by problems of missing values. In many cases, data are only available for a limited number of countries or only for certain data components. Missing values can render the composite indicator less reliable for the countries for which only limited information is available and can distort the relative standing of all countries in the composite.

There are a number of approaches for dealing with missing values, all of which have flaws:

  • data deletion - omitting entire records (for variables or countries) when there is a substantial number of missing data;
  • mean substitution - substituting a variable's mean value computed from available cases to fill in missing values;
  • regression - using regressions based on other variables to estimate the missing values;
  • multiple imputation - using a large number of sequential regressions with indeterminate outcomes, which are run multiple times and averaged;
  • nearest neighbour - identifying and substituting the most similar case for the one with a missing value; or
  • ignore them - take the average index of the remaining indicators.

The idea of imputation is both seductive and dangerous. It is seductive because it can lull the user into the pleasurable state of believing that the data are complete after all, and it is dangerous because it lumps together situations where the problem is sufficiently minor that it can legitimately handled in this way and situations where standard estimators applied to real and imputed data have substantial bias.

- Dempster A.P. and Rubin D.B. (1983) Introduction pp.3-10, in Incomplete Data in Sample Surveys (vol. 2): Theory and Bibliography (W.G. Madow, I. Olkin and D.B. Rubin eds.) New York: Academic Press.


Next

 

The Multivariate analysis can be used to study the overall structure of the dataset, assess its suitability, and guide subsequent methodological choices (e.g., weighting and aggregation).