Stata 19 new CATE command
Fernando Rios-Avila has a new blog about the new CATE command in Stata 19, which implements Conditional Average Treatment Effect (CATE) estimation.
https://friosavila.github.io/app_metrics/app_metrics12.html
clear all
webuse assets3
global catecovars age educ i.(incomecat pension married twoearn ira ownhome)
cate po (assets $catecovars) (e401k), group(incomecat) nolog
(Excerpt from Chernozhukov and Hansen (2004))
Conditional average treatment effects Number of observations = 9,913
Estimator: Partialing out Number of folds in cross-fit = 10
Outcome model: Linear lasso Number of outcome controls = 17
Treatment model: Logit lasso Number of treatment controls = 17
CATE model: Random forest Number of CATE variables = 17
------------------------------------------------------------------------------
| Robust
assets | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
GATE |
incomecat |
0 | 3999.888 989.8742 4.04 0.000 2059.77 5940.006
1 | 1424.931 1668.081 0.85 0.393 -1844.447 4694.31
2 | 5092.801 1344.769 3.79 0.000 2457.102 7728.499
3 | 8700.512 2272.126 3.83 0.000 4247.226 13153.8
4 | 20350.97 4717.093 4.31 0.000 11105.64 29596.3
-------------+----------------------------------------------------------------
ATE |
e401k |
(Eligible |
vs |
Not elig..) | 7914.24 1150.79 6.88 0.000 5658.734 10169.75
-------------+----------------------------------------------------------------
POmean |
e401k |
Not eligi.. | 14012.54 834.0963 16.80 0.000 12377.74 15647.34
------------------------------------------------------------------------------
He said “We can achieve comparable results using standard regression with full interactions:”
clear all
webuse assets3
global catecovars age educ i.(incomecat pension married twoearn ira ownhome)
* you do not want to see all interactions.
qui:reg assets i.incomecat##i.e401k##c.($catecovars), robust
* Average Treatment Effect (ATE)
margins, at(e401k=(0 1)) noestimcheck contrast(atcontrast(r))
* Group Average Treatment Effects (GATEs)
margins, at(e401k=(0 1)) noestimcheck contrast(atcontrast(r)) over(incomecat)
(Excerpt from Chernozhukov and Hansen (2004))
Contrasts of predictive margins Number of obs = 9,913
Model VCE: Robust
Expression: Linear prediction, predict()
1._at: e401k = 0
2._at: e401k = 1
------------------------------------------------
| df F P>F
-------------+----------------------------------
_at | 1 44.72 0.0000
|
Denominator | 9833
------------------------------------------------
--------------------------------------------------------------
| Delta-method
| Contrast std. err. [95% conf. interval]
-------------+------------------------------------------------
_at |
(2 vs 1) | 7642.407 1142.797 5402.291 9882.524
--------------------------------------------------------------
Contrasts of predictive margins Number of obs = 9,913
Model VCE: Robust
Expression: Linear prediction, predict()
Over: incomecat
1._at: 0.incomecat
e401k = 0
1.incomecat
e401k = 0
2.incomecat
e401k = 0
3.incomecat
e401k = 0
4.incomecat
e401k = 0
2._at: 0.incomecat
e401k = 1
1.incomecat
e401k = 1
2.incomecat
e401k = 1
3.incomecat
e401k = 1
4.incomecat
e401k = 1
-------------------------------------------------
| df F P>F
--------------+----------------------------------
_at@incomecat |
(2 vs 1) 0 | 1 16.08 0.0001
(2 vs 1) 1 | 1 0.67 0.4123
(2 vs 1) 2 | 1 16.03 0.0001
(2 vs 1) 3 | 1 13.77 0.0002
(2 vs 1) 4 | 1 17.26 0.0000
Joint | 5 12.76 0.0000
|
Denominator | 9833
-------------------------------------------------
---------------------------------------------------------------
| Delta-method
| Contrast std. err. [95% conf. interval]
--------------+------------------------------------------------
_at@incomecat |
(2 vs 1) 0 | 3594.182 896.3298 1837.191 5351.172
(2 vs 1) 1 | 1283.3 1565.151 -1784.718 4351.317
(2 vs 1) 2 | 5056.144 1262.728 2580.938 7531.35
(2 vs 1) 3 | 8610.622 2320.156 4062.641 13158.6
(2 vs 1) 4 | 19665.61 4733.551 10386.88 28944.34
---------------------------------------------------------------
Numerically these two sets of estimations may be close, but I don’t think they are from the same model, even the same idea. The CATE command with “PO” option is to estimate the CATE with “partialling out” approach.
What Fernando did is a “regression adjustment” type of model, which is modeling outcome model and allow interaction of treatment and covariates. The “partialling out” approach is sometimes called double ML.
Let’s see how to replicate “teffects ra” with “reg”:
clear all
webuse assets3
global catecovars age educ i.(incomecat pension married twoearn ira ownhome)
teffects ra (assets age educ i.(incomecat pension married twoearn ira ownhome))(e401k)
qui: reg assets i.e401k##c.(age educ) i.e401k##i.(incomecat pension married twoearn ira ownhome)
margins, at(e401k=(0 1)) noestimcheck contrast(atcontrast(r))
(Excerpt from Chernozhukov and Hansen (2004))
Iteration 0: EE criterion = 1.291e-21
Iteration 1: EE criterion = 1.046e-23
Treatment-effects estimation Number of obs = 9,913
Estimator : regression adjustment
Outcome model : linear
Treatment model: none
------------------------------------------------------------------------------
| Robust
assets | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATE |
e401k |
(Eligible |
vs |
Not elig..) | 7929.242 1216.273 6.52 0.000 5545.39 10313.09
-------------+----------------------------------------------------------------
POmean |
e401k |
Not eligi.. | 14094.12 863.7641 16.32 0.000 12401.18 15787.07
------------------------------------------------------------------------------
Contrasts of predictive margins Number of obs = 9,913
Model VCE: OLS
Expression: Linear prediction, predict()
1._at: e401k = 0
2._at: e401k = 1
------------------------------------------------
| df F P>F
-------------+----------------------------------
_at | 1 34.39 0.0000
|
Denominator | 9889
------------------------------------------------
--------------------------------------------------------------
| Delta-method
| Contrast std. err. [95% conf. interval]
-------------+------------------------------------------------
_at |
(2 vs 1) | 7929.242 1352.153 5278.746 10579.74
--------------------------------------------------------------
DoubleML
The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, High-quality machine learning estimation and Sample splitting.
They consider a partially linear model:
\[ y_i = \theta d_i + g_0(x_i) + \eta_i \]
\[ d_i = m_0(x_i) + v_i \]
This model is quite general, except it does not allow interaction of \(d\) and \(x\); therefore no hetergeneous treatment effect across \(x\). But “DoubleML” implements more than partially linear model, it actually allows for heterogeneous treatment effects, in models such as interactive regression model.
The basic idea of doubleML is to use any machine learning algorithm to estimate outcome equation (\(l_0(X) = E(Y | X)\)) and treatment equation (\(m_0(X) = E(D | X)\)). Then get the residuals, namely \(\hat W=Y-\hat l_0(X)\) and \(\hat V = D - \hat m_0(X)\).
Then regress \(\hat W\) on \(\hat V\). Based on FWL theorem, you get \(\hat \theta\).
An important component here is to specify a Neyman-orthogonal score function. In the case of PLR, it can be the product of the two residuals:
\[\psi (W; \theta, \eta) = (Y-D\theta -g(X))(D-m(X)) \]
The estimators \(\hat \theta\) is to solve the equation that the sample mean of this score function being 0.
And the variance of this score function is used as the variance of the doubleML estimator’s variance.
Now we try to replicate the CATE estimation in Stata using the R package DoubleML. I have not found a way to include factor variables in the lasso regression in R, so I included them as numerica variables.
The example in stata’s cate command is to use lasso on both outcome and treatment regression, then random forest on the individual treatment effect to get ATE. Here I try to replicate it in R.
library(haven)
library(DoubleML)
library(mlr3)
library(mlr3learners)
library(data.table)
library(dplyr)
# get rid of labels because doubleML does not like them
data <- read_dta("https://www.stata-press.com/data/r19/assets3.dta") |>
zap_labels()
# mutate(incomecat=as.factor(incomecat),
# pension=as.factor(pension),
# married=as.factor(married),
# twoearn=as.factor(twoearn),
# ira=as.factor(ira),
# ownhome=as.factor(ownhome))
assets3 <- data.table::setDT(data)
dml_data = DoubleMLData$new(data = assets3,
y_col = "assets",
d_cols = "e401k",
x_cols = c("age","educ", "incomecat","pension", "married","twoearn","ira","ownhome"))
print(dml_data)
================= DoubleMLData Object ==================
------------------ Data summary ------------------
Outcome variable: assets
Treatment variable(s): e401k
Covariates: age, educ, incomecat, pension, married, twoearn, ira, ownhome
Instrument(s):
Selection variable:
No. Observations: 9913
lgr::get_logger("mlr3")$set_threshold("warn")
learner = lrn("regr.glmnet", lambda=1)
ml_g_sim = learner$clone()
ml_m_sim = learner$clone()
set.seed(123)
obj_dml_plr = DoubleMLPLR$new(dml_data, ml_l=ml_g_sim, ml_m=ml_m_sim, n_folds=10)
obj_dml_plr$fit()
obj_dml_plr$summary()
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
e401k 7388 1293 5.714 1.1e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
If we use random forest on the two models:
# surpress messages from mlr3 package during fitting
lgr::get_logger("mlr3")$set_threshold("warn")
learner = lrn("regr.ranger", num.trees=2000, max.depth=5, min.node.size=2)
ml_l = learner$clone()
ml_m = learner$clone()
# learner = lrn("regr.glmnet")
# ml_g_sim = learner$clone()
# ml_m_sim = learner$clone()
set.seed(123)
obj_dml_plr = DoubleMLPLR$new(dml_data, ml_l=ml_l, ml_m=ml_m, n_folds=5)
obj_dml_plr$fit()
obj_dml_plr$summary()
Estimates and significance testing of the effect of target variables
Estimate. Std. Error t value Pr(>|t|)
e401k 9474 1376 6.883 5.88e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can do the same in Stata:
clear all
webuse assets3
global catecovars age educ incomecat pension married twoearn ira ownhome
cate po (assets age educ incomecat pension married twoearn ira ownhome) (e401k), omethod(rforest) tmethod(rforest) nolog xfolds(5)
(Excerpt from Chernozhukov and Hansen (2004))
Conditional average treatment effects Number of observations = 9,913
Estimator: Partialing out Number of folds in cross-fit = 5
Outcome model: Random forest Number of outcome controls = 8
Treatment model: Random forest Number of treatment controls = 8
CATE model: Random forest Number of CATE variables = 8
------------------------------------------------------------------------------
| Robust
assets | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATE |
e401k |
(Eligible |
vs |
Not elig..) | 8300.384 1174.524 7.07 0.000 5998.359 10602.41
-------------+----------------------------------------------------------------
POmean |
e401k |
Not eligi.. | 14002.82 840.8776 16.65 0.000 12354.73 15650.91
------------------------------------------------------------------------------
The results are somewhat similar, but not exactly the same. There are some subtle differences in the way these two packages running random forest on the two models that I am not aware of.
AIPW
Stata’s cate also has “aipw” option.
clear all
webuse assets3
global catecovars age educ incomecat pension married twoearn ira ownhome
cate aipw (assets $catecovars) (e401k), nolog
(Excerpt from Chernozhukov and Hansen (2004))
Conditional average treatment effects Number of observations = 9,913
Estimator: Augmented IPW Number of folds in cross-fit = 10
Outcome model: Linear lasso Number of outcome controls = 8
Treatment model: Logit lasso Number of treatment controls = 8
CATE model: Random forest Number of CATE variables = 8
------------------------------------------------------------------------------
| Robust
assets | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATE |
e401k |
(Eligible |
vs |
Not elig..) | 7053.901 1181.863 5.97 0.000 4737.492 9370.31
-------------+----------------------------------------------------------------
POmean |
e401k |
Not eligi.. | 14146.76 858.6184 16.48 0.000 12463.89 15829.62
------------------------------------------------------------------------------
I’ll try to do this with “npcausal”:
library(npcausal)
#SL.library <- c("SL.earth","SL.gam","SL.glmnet","SL.glm.interaction", "SL.mean","SL.ranger", "SL.xgboost")
#SL.library <- c("SL.glmnet")
SL.lasso = function(...) {
SL.glmnet(..., alpha=1)
}
SL.library <- c("SL.lasso")
Y <- data$assets
A <- data$e401k
W <- data[, c("age", "educ", "incomecat", "pension", "married", "twoearn", "ira", "ownhome")]
aipw<- ate(y=Y, a=A, x=W, nsplits=10, sl.lib=SL.library)
| | | 0% | |== | 2% | |==== | 5% | |===== | 8% | |======= | 10% | |========= | 12% | |========== | 15% | |============ | 18% | |============== | 20% | |================ | 22% | |================== | 25% | |=================== | 28% | |===================== | 30% | |======================= | 32% | |======================== | 35% | |========================== | 38% | |============================ | 40% | |============================== | 42% | |================================ | 45% | |================================= | 48% | |=================================== | 50% | |===================================== | 52% | |====================================== | 55% | |======================================== | 58% | |========================================== | 60% | |============================================ | 62% | |============================================== | 65% | |=============================================== | 68% | |================================================= | 70% | |=================================================== | 72% | |==================================================== | 75% | |====================================================== | 78% | |======================================================== | 80% | |========================================================== | 82% | |============================================================ | 85% | |============================================================= | 88% | |=============================================================== | 90% | |================================================================= | 92% | |================================================================== | 95% | |==================================================================== | 98% | |======================================================================| 100%
parameter est se ci.ll ci.ul pval
1 E{Y(0)} 14138.295 851.5868 12469.185 15807.405 0
2 E{Y(1)} 21131.594 893.7780 19379.789 22883.399 0
3 E{Y(1)-Y(0)} 6993.299 1179.0464 4682.368 9304.229 0
Note here to match the Stata results, I included only “glmnet” in the SuperLearner library. The results are not exactly the same, but close enough.
Or we can use a similar package with more functionalities: “AIPW”.
library(AIPW)
library(SuperLearner)
#SuperLearner libraries for outcome (Q) and exposure models (g)
#sl.lib <- c("SL.glmnet")
AIPW_sl <- AIPW$new(Y= Y,
A= A,
W= W,
Q.SL.library = SL.library,
g.SL.library = SL.library,
k_split = 10,
verbose=FALSE)
AIPW_sl$fit()
AIPW_sl$summary()
AIPW_sl$result
Estimate SE 95% LCL 95% UCL N
Mean of Exposure 21626.890 874.3117 19913.239 23340.541 3682
Mean of Control 14055.914 847.7744 12394.276 15717.552 6231
Mean Difference 7570.976 1164.1694 5289.204 9852.748 9913
# library(ggplot2)
# AIPW_sl$plot.p_score()
# AIPW_sl$plot.ip_weights()