OVB

Suppose we have a true model,

\[ Y = \delta D + X \beta + \gamma Z + \epsilon \]

However, we only observe \(X\), not \(Z\). We estimate the model

\[ Y = \delta D + X \beta + u \]

In this case, we have omitted variable bias (OVB).

Cinelli and Hazlett (2020) shows that

\[ |\hat bias| = \sqrt{\frac{R^2_{Y \sim Z|D,X} R^2_{D \sim Z|X}}{1-R^2_{D \sim Z|X}}} (\frac{\hat \sigma_{y.dx}}{\hat \sigma_{d.x}})\]

We can see that when we have high \(R^2\) between \(Y\) and \(Z\), bias is high. When we hvae high \(R^2\) between \(D\) and \(Z\), bias is high. In other words, the omitted variable needs to be somewhat highly correlated with both \(Y\) and \(D\) to have a high bias.

Cinalli and Hazlett paper has many stats, but we focus on three things we can report:

Partial \(R^2\) of treatment on outcome

The original purpose of this stat is to see how much variation treatment explains outcome after partialling out the covariates.

Also, this can be used for an extreme situation. Suppose the unobserved confounder \(Z\) explains all residual variance of the outcome, that is, \(R^2_{Y \sim Z|D,X} = 1\). What kind of situation we would have 0 effect?

It is shown if \(R^2_{Y \sim Z|D,X} = 1\), then \(R^2_{D \sim Z|X} \ge R^2_{Y \sim D|X}\) to make the effect go to 0. Roughly speaking, the correlation between omitted variable and outcome needs to be higher than the correlation between treatment and outcome to make the treatment effect go to 0.

Robust value

The robustness value \(RV_{q,α}\) quantifies the minimal strength of association that the confounder needs to have, both with the treatment and with the outcome, so that a confidence interval of level \(1 − \alpha\) includes a change of \(q \%\) of the current estimated value. \(q\) is often 1. That is, for the effect to go to 0, that robust value is the mininum value of the correlation between the omitted variable and the treatment and the correlation between the omitted variable and the outcome. Note this is about both the correlation between \(Z\) and \(D\) and the correlation between \(Z\) and \(Y\). They need to be both high to make the effect go to 0.

Bounds using observed covariate(s)

For the first two values, they are absolute values. You don’t know whether those are large or small, without context. They cannot be compared across studies.

Suppose there is a covariate \(x\) that you know from your domain knowledge that it is an important factor in explaining treatment or/and outcome. Then you can compare how bad the confounder needs to be compare to this covariate. This will give us some sense how much a confounder needs to be comparing to a strong covariate, in driving the effect to 0.

\[ k_D := \frac{R^2_{D \sim Z|X_{-j}}}{R^2_{D \sim X_j|X_{-j}}} \]

where \(X_{-j}\) represents all covariates other than the \(j\)th we pick as the benchmark covariate.

\[ k_Y := \frac{R^2_{Y \sim Z|D,X_{-j}}}{R^2_{Y \sim X_j|D,X_{-j}}} \]

Example

# loads package
library(sensemakr)

## See details in:

## Carlos Cinelli and Chad Hazlett (2020). Making Sense of Sensitivity: Extending Omitted Variable Bias. Journal of the Royal Statistical Society, Series B (Statistical Methodology).

# loads dataset
data("darfur")

# runs regression model
model <- lm(peacefactor ~ directlyharmed + age + farmer_dar + herder_dar +
                         pastvoted + hhsize_darfur + female + village, data = darfur)

# runs sensemakr for sensitivity analysis
sensitivity <- sensemakr(model = model, 
                         treatment = "directlyharmed",
                         benchmark_covariates = "female",
                         kd = 1:3)

# short description of results
sensitivity

## Sensitivity Analysis to Unobserved Confounding
## 
## Model Formula: peacefactor ~ directlyharmed + age + farmer_dar + herder_dar + 
##     pastvoted + hhsize_darfur + female + village
## 
## Null hypothesis: q = 1 and reduce = TRUE 
## 
## Observed Estimates of ' directlyharmed ':
##   Coef. estimate: 0.09732 
##   Standard Error: 0.02326 
##   t-value: 4.18445 
## 
## Sensitivity Statistics:
##   Partial R2 of treatment with outcome: 0.02187 
##   Robustness Value, q = 1 : 0.13878 
##   Robustness Value, q = 1 alpha = 0.05 : 0.07626 
## 
## For more information, check summary.

# long description of results
summary(sensitivity)

## Sensitivity Analysis to Unobserved Confounding
## 
## Model Formula: peacefactor ~ directlyharmed + age + farmer_dar + herder_dar + 
##     pastvoted + hhsize_darfur + female + village
## 
## Null hypothesis: q = 1 and reduce = TRUE 
## -- This means we are considering biases that reduce the absolute value of the current estimate.
## -- The null hypothesis deemed problematic is H0:tau = 0 
## 
## Observed Estimates of 'directlyharmed': 
##   Coef. estimate: 0.0973 
##   Standard Error: 0.0233 
##   t-value (H0:tau = 0): 4.1844 
## 
## Sensitivity Statistics:
##   Partial R2 of treatment with outcome: 0.0219 
##   Robustness Value, q = 1: 0.1388 
##   Robustness Value, q = 1, alpha = 0.05: 0.0763 
## 
## Verbal interpretation of sensitivity statistics:
## 
## -- Partial R2 of the treatment with the outcome: an extreme confounder (orthogonal to the covariates) that explains 100% of the residual variance of the outcome, would need to explain at least 2.19% of the residual variance of the treatment to fully account for the observed estimated effect.
## 
## -- Robustness Value, q = 1: unobserved confounders (orthogonal to the covariates) that explain more than 13.88% of the residual variance of both the treatment and the outcome are strong enough to bring the point estimate to 0 (a bias of 100% of the original estimate). Conversely, unobserved confounders that do not explain more than 13.88% of the residual variance of both the treatment and the outcome are not strong enough to bring the point estimate to 0.
## 
## -- Robustness Value, q = 1, alpha = 0.05: unobserved confounders (orthogonal to the covariates) that explain more than 7.63% of the residual variance of both the treatment and the outcome are strong enough to bring the estimate to a range where it is no longer 'statistically different' from 0 (a bias of 100% of the original estimate), at the significance level of alpha = 0.05. Conversely, unobserved confounders that do not explain more than 7.63% of the residual variance of both the treatment and the outcome are not strong enough to bring the estimate to a range where it is no longer 'statistically different' from 0, at the significance level of alpha = 0.05.
## 
## Bounds on omitted variable bias:
## 
## --The table below shows the maximum strength of unobserved confounders with association with the treatment and the outcome bounded by a multiple of the observed explanatory power of the chosen benchmark covariate(s).
## 
##  Bound Label R2dz.x R2yz.dx      Treatment Adjusted Estimate Adjusted Se
##    1x female 0.0092  0.1246 directlyharmed            0.0752      0.0219
##    2x female 0.0183  0.2493 directlyharmed            0.0529      0.0204
##    3x female 0.0275  0.3741 directlyharmed            0.0304      0.0187
##  Adjusted T Adjusted Lower CI Adjusted Upper CI
##      3.4389            0.0323            0.1182
##      2.6002            0.0130            0.0929
##      1.6281           -0.0063            0.0670

# minimal reporting
ovb_minimal_reporting(sensitivity, format = "html")

	Outcome: peacefactor
Treatment	Est.	S.E.	t-value	\(R^2_{Y \sim D \|{\bf X}}\)	\(RV_{q = 1}\)	\(RV_{q = 1, \alpha = 0.05}\)
directlyharmed	0.097	0.023	4.184	2.2%	13.9%	7.6%
Note: df = 783; Bound ( 1x female ): \(R^2_{Y\sim Z\| {\bf X}, D}\) = 12.5%, \(R^2_{D\sim Z\| {\bf X} }\) = 0.9%

Let’s see how to interpret this.

\(RV_{q=1} = 13.9 \%\). This means that unobserved confounders that explain 13.9% of the residual variance both of the treatment and of the outcome are sufficiently strong to explain away all the observed effect. On the other hand, unobserved confounders that do not explain at least 13.9% of the residual variance both of the treatment and of the outcome are not sufficiently strong to do so.

The robustness value for testing the null hypothesis that the coefficient of “directlyharmed” is zero (\(RV_{q=1,\alpha=0.05}\)) is 7.6%. This means that unobserved confounders that explain 7.6% of the residual variance both of the treatment and of the outcome are sufficiently strong to bring the lower bound of the confidence interval to zero (at the chosen significance level of 5%).

The partial \(R^2\) of directlyharmed with peacefactor means that, in an extreme scenario, in which we assume that unobserved confounders explain all of the left out variance of the outcome, these unobserved confounders would need to explain at least 2.2% of the residual variance of the treatment to fully explain away the observed effect.

The above numbers are for the absolute strength the confounder needs to be to explain away the treatment effect. The authors believe in this context, “female” is an important factor. Suppose the confounder is as important as “female”, what the partial \(R^2\) would be? It is shown that \(R^2_{Y \sim Z|X,D} = 12.5 \%\) and \(R^2_{D \sim Z|X} = 0.9 \%\). These two are both below the \(RV\) values. That means even if the confounder is as strong as “female”, it is not strong enough to explain away the treatment effect.

# plot bias contour of point estimate
plot(sensitivity)

This plot shows that even 3 times as strong as “female”, the confounder is not strong enough to explain away the treatment effect.

sensitivity analysis for IV

Even with an IV model, we need sensititvity analysis. The worry would be there could be an unobserved confounder that affects both the instrument and the outcome.

Cinelli and Hazlett have a package “iv.sensemakr” for this. The usage is similar to “sensemakr”.

# loads package
library(iv.sensemakr)

# loads dataset
data("card")

# prepares data
y <- card$lwage  # outcome
d <- card$educ   # treatment
z <- card$nearc4 # instrument
x <- model.matrix( ~ exper + expersq + black + south + smsa + reg661 + reg662 +
                     reg663 + reg664 + reg665+ reg666 + reg667 + reg668 + smsa66,
                   data = card) # covariates
# fits IV model
card.fit <- iv_fit(y,d,z,x)

# see results
card.fit

## 
## Instrumental Variable Estimation
## (Anderson-Rubin Approach)
## =============================================
## IV Estimates:
##   Coef. Estimate: 0.132
##   t-value: 2.33
##   p-value: 0.02
##   Conf. Interval: [0.025, 0.285]
## Note: H0 = 0, alpha = 0.05, df = 2994.
## =============================================
## See summary for first stage and reduced form.

# runs sensitivity analysis
card.sens <- sensemakr(card.fit, benchmark_covariates = c("black", "smsa"))

# see results
card.sens

## 
## Sensitivity Analysis for Instrumental Variables
## (Anderson-Rubin Approach)
## =============================================================
## IV Estimates:
##   Coef. Estimate: 0.132
##   t-value: 2.33
##   p-value: 0.02
##   Conf. Interval: [0.025, 0.285]
## 
## Sensitivity Statistics:
##   Extreme Robustness Value: 0.000523
##   Robustness Value: 0.00667
## 
## Bounds on Omitted Variable Bias:
##  Bound Label  R2zw.x R2y0w.zx Lower CI Upper CI Crit. Thr.
##     1x black 0.00221   0.0750  -0.0212    0.402       2.59
##      1x smsa 0.00639   0.0202  -0.0192    0.396       2.57
## 
## Note: H0 = 0, q >= 1, alpha = 0.05, df = 2994.
## =============================================================
## See summary for first stage and reduced form.

summary(card.sens)

## 
## Sensitivity Analysis for Instrumental Variables
## (Anderson-Rubin Approach)
## =============================================================
## IV Estimates:
##   Coef. Estimate: 0.132
##   t-value: 2.33
##   p-value: 0.02
##   Conf. Interval: [0.025, 0.285]
## 
## Sensitivity Statistics:
##   Extreme Robustness Value: 0.000523
##   Robustness Value: 0.00667
## 
## Bounds on Omitted Variable Bias:
##  Bound Label  R2zw.x r2y0w.zx Lower CI Upper CI Crit. Thr.
##     1x black 0.00221   0.0750  -0.0212    0.402       2.59
##      1x smsa 0.00639   0.0202  -0.0192    0.396       2.57
## 
## Note: H0 = 0, q >= 1, alpha = 0.05, df = 2994.
## -------------------------------------------------------------
## FS Estimates:
##   Coef. Estimate: 0.32
##   Standard Error: 0.0879
##   t-value: 3.64
##   p-value: 0.000276
##   Conf. Interval: [0.148, 0.492]
## 
## Sensitivity Statistics:
##   Extreme Robustness Value: 0.00313
##   Robustness Value: 0.0302
## 
## Bounds on Omitted Variable Bias:
##  Bound Label  R2zw.x R2dw.zx Lower CI Upper CI Crit. Thr.
##     1x black 0.00221 0.03342    0.109    0.531       2.40
##      1x smsa 0.00639 0.00498    0.120    0.520       2.27
## 
## Note: H0 = 0, q = 1, alpha = 0.05, df = 2994.
## -------------------------------------------------------------
## RF Estimates:
##   Coef. Estimate: 0.0421
##   Standard Error: 0.0181
##   t-value: 2.33
##   p-value: 0.02
##   Conf. Interval: [0.007, 0.078]
## 
## Sensitivity Statistics:
##   Extreme Robustness Value: 0.000523
##   Robustness Value: 0.00667
## 
## Bounds on Omitted Variable Bias:
##  Bound Label  R2zw.x R2dw.zx Lower CI Upper CI Crit. Thr.
##     1x black 0.00221  0.0657 -0.00418   0.0883       2.56
##      1x smsa 0.00639  0.0197 -0.00429   0.0884       2.56
## 
## Note: H0 = 0, q = 1, alpha = 0.05, df = 2994.
## =============================================================

The basic idea is to run an IV regression, then apply sensmakr to that model. In this case, the authors use “black” and “smsa” as benchmark covariates. The authors believe these are important factors. A quick look at the result we can see that a confounder as strong as “black” or “smsa” is not strong enough to explain away the treatment effect. We can also plot a contour plot to see how strong the confounder needs to be to explain away the treatment effect.

plot(card.sens, parm = "rf", sensitivity.of = "t-value", lim = 0.08, kz = 1, ky = 1)

Unfortunately the package currently only allows for one instrument. Should not be too hard to modify it to be more general.

Sensitivity analysis of OLS with sensemakr

OVB

Partial \(R^2\) of treatment on outcome

Robust value

Bounds using observed covariate(s)

Example

sensitivity analysis for IV