Robustness to Missing Data: Breakdown Point Analysis
Missing data is a pervasive issue in empirical economics. For example, consider a randomized controlled trial designed to study whether a scholarship to attend school increases later earnings. If subjects who received the scholarship but experienced poor outcomes are less likely to respond to the researcher’s survey inquiring about their earnings, standard approaches that ignore nonrespondents will overestimate the effect of the scholarship on earnings. If the effect is severely overestimated, researchers may find a positive average effect among respondents when the average effect for the whole population is zero.
In “Robustness to Missing Data: Breakdown Point Analysis,” Daniel Ober-Reynolds proposes a methodology to investigate the robustness of empirical results found using incomplete datasets. This involves estimating the “breakdown point,” defined as how different the distribution of respondents would need to be from the distribution of nonrespondents for a result to fail in the whole population. If the breakdown point is larger than the plausible difference between these distributions, the result holds in the whole population. Reporting estimates of the breakdown point alongside standard results offers a simple and concise way to communicate the results’ robustness. Dr. Ober-Reynolds proceeds to demonstrate this methodology by estimating the breakdown point of the results of three randomized controlled trials known to suffer from attrition. Some results have small breakdown points and appear rather fragile, while others have large breakdown points and appear quite robust.
Missing data is a broad problem that extends beyond survey nonresponse and the evaluation of randomized controlled trials. The breakdown point methodology is also applicable to most models commonly used by economists without additional data or modeling assumptions.
Read the full article here.
- Senior Economist