Missing covariates in causal inference matching: Statistical imputation using machine learning and evolutionary search algorithms
Causal interpretation of relationships is complicated by the ‘fundamental problem of causal inference’, a condition in which exogenous confounds are concomitantly uncontrolled for within a single stochastic equation. Rosenbaum and Rubin (1983, 1985) introduced a principled approach to establishing exchangeability across treatment strata, evaluated with Mahalanobis’ distance (Mahalanobis, 1936). However, these approaches to producing causal inference are not unbiased in the presence of missing covariates (Rubin, 1971, 1974, 1976, 1987), necessitating multiple imputation to produce unbiased, but not necessarily accurate, inference. I review existing literature on missing data, and then conduct sensitivity analyses on the effects of single imputation from a Bayesian machine learning framework with exogenous post- stratification. The results indicate improvements upon traditional techniques, using a two stage methodology: data imputation using random forests to recover unspecified missingness functions, followed by optimal covariate matching on both propensity score and imputed covariates. This removes the biased recovery of missing parameters or treatment selection methods, discarding incomplete observations (complete cases), or unprincipled modelling of treatment assignment by empirical model constraints.
Hurley, Landon, "Missing covariates in causal inference matching: Statistical imputation using machine learning and evolutionary search algorithms" (2017). ETD Collection for Fordham University. AAI10250674.