Poisson regression with many fixed effects
I have a few posts about Poisson regression for non-negative outcomes, including continous variable, count variable, or corner solution (such as non-negative outcome with many zeros). The benefits include that we can use QMLE Poisson, which means we only need to have the mean of outcome specified correctly, as an exponetiated function of covariates. There is no assumption that the distribution needs to be a Poisson distribution. When we want to model non-negative outcome with fixed effects, fixed effect Poisson should be our first choice also, because fixed effect Poisson is shown to be consistent without the assumption of the distribution of the error term, and it does not have the “incidental parameter” problem.
Stata has “xtpoisson” to do fixed effect Poisson. However, we would recommend “ppmlhdfe”. “ppmlhdfe is a Stata package that implements Poisson pseudo-maximum likelihood regressions (PPML) with multi-way fixed effects”. It is faster than xtpoisson. More importantly, it does report the predicted value correctly. I mentioned in past post that we need to be cautious when we need to use predicted values after “xtpoisson, fe”, or “xtreg, fe”. The reason is that these are models estimated after “absorbing” the fixed effects, meaning fixed effects are not actually estimated. Therefore the predicted values are without fixed effects. But in “ppmlhdfe” we get the correct predicted values with fixed effect.
On a side note, the same author has “reghdfe” as a replacement for “areg” in Stata. “reghdfe” is faster than “areg”, or “xtreg, fe”. It also allows for many levels of fixed effects. As “ppmlhdfe”, it also provides correct predicted values with fixed effects.
IV poisson with many fixed effects
Stata has a “ivpoisson” function with handles IV in a Poisson regression setting. However, it does not have the option to include fixed effects.
In the case that the endogenous variable is continous and we can use a linear model for the first stage, Wooldridge 2010 textbook (chapter 18) suggested using control function approach, which is very simple:
- Run the first stage regression (reduced from for the endogous variable), and get the predicted value of the residual. Sometimes called Mundlak residual. Note in the many fixed effects case, we need to include all fixed effects in the first stage regression. In the case of stata, we would use “reghdfe” here.
- Include both the endogenous variable and the Mundlak residual as additional regressors in the second stage, the original Poisson regression. Use “ppmlhdfe” here.
Note that the standard errors would be incorrect if we use the standard errors from the second stage regression. We need to use the bootstrap method to get the correct standard errors. The reason is that the Mundlak residual is generated from the first stage, we need to bootstrap the whole process to get the correct standard errors. In the case of clustered standard errors, we need to bootstrap the clusters.
Example
* control function approach
webuse website
reghdfe time phone frfam, absorb(ad female) resid
predict double u2h_fe, resid
ppmlhdfe visits time u2h_fe frfam, absorb(ad female)
(Visits to website)
(dropped 1 singleton observations)
(MWFE estimator converged in 3 iterations)
HDFE Linear regression Number of obs = 499
Absorbing 2 HDFE groups F( 2, 484) = 29.02
Prob > F = 0.0000
R-squared = 0.3137
Adj R-squared = 0.2938
Within R-sq. = 0.1071
Root MSE = 2.2239
------------------------------------------------------------------------------
time | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
phone | .2705429 .0626834 4.32 0.000 .1473778 .393708
frfam | .2539231 .0619373 4.10 0.000 .1322239 .3756222
_cons | 1.406026 .2549245 5.52 0.000 .9051305 1.906921
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
ad | 12 0 12 |
female | 2 1 1 |
-----------------------------------------------------+
(1 missing value generated)
Iteration 1: deviance = 3.7303e+02 eps = . iters = 3 tol = 1.0e-0
> 4
> min(eta) = -0.57 P
Iteration 2: deviance = 3.5849e+02 eps = 4.06e-02 iters = 3 tol = 1.0e-0
> 4
> min(eta) = -0.59
Iteration 3: deviance = 3.5840e+02 eps = 2.46e-04 iters = 3 tol = 1.0e-0
> 4
> min(eta) = -0.60
Iteration 4: deviance = 3.5840e+02 eps = 3.17e-08 iters = 2 tol = 1.0e-0
> 4
> min(eta) = -0.60
Iteration 5: deviance = 3.5840e+02 eps = 9.72e-16 iters = 2 tol = 1.0e-0
> 5
> min(eta) = -0.60 S
Iteration 6: deviance = 3.5840e+02 eps = 3.24e-16 iters = 1 tol = 1.0e-0
> 9
> min(eta) = -0.60 S O
-------------------------------------------------------------------------------
> -----------------------------
(legend: p: exact partial-out s: exact solver h: step-halving o: epsilon
> below tolerance)
Converged in 6 iterations and 14 HDFE sub-iterations (tol = 1.0e-08)
HDFE PPML regression No. of obs = 499
Absorbing 2 HDFE groups Residual df = 483
Wald chi2(3) = 169.93
Deviance = 358.4027719 Prob > chi2 = 0.0000
Log pseudolikelihood = -993.8547331 Pseudo R2 = 0.2976
------------------------------------------------------------------------------
| Robust
visits | Coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
time | .0448103 .0448665 1.00 0.318 -.0431265 .1327471
u2h_fe | .0765144 .0459404 1.67 0.096 -.0135272 .166556
frfam | .0004108 .0205161 0.02 0.984 -.0398 .0406215
_cons | 1.510566 .1033122 14.62 0.000 1.308078 1.713054
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
ad | 12 0 12 |
female | 2 1 1 |
-----------------------------------------------------+
In this example, “time” is the endognous variable, and “phone” is the instrument. We first run the first stage regression with “reghdfe”, and get the Mundlak residual “u2h_fe”. Then we include “time” and “u2h_fe” in the second stage regression with “ppmlhdfe”. “ad” and “female” are two fixed effects. The standard errors are incorrect, we need to use bootstrap to get the correct standard errors.
To verify our control function approach, we do the same with a linear model. By the way, for linear model and Poisson model, control function approach works, in the sense you can just put in predicted residual in the second stage and that takes care of the endogneity adjustment. There is a package “ivreghdfe” we use to verify our two step process.
clear all
webuse website
ivreghdfe visits frfam (time=phone), absorb(ad female)
* now the control function apparoach
reghdfe time phone frfam , absorb(ad female) resid
predict u2h_fe, resid
reghdfe visits time u2h_fe frfam , absorb(ad female)
(Visits to website)
(dropped 1 singleton observations)
(MWFE estimator converged in 3 iterations)
IV (2SLS) estimation
--------------------
Estimates efficient for homoskedasticity only
Statistics consistent for homoskedasticity only
Number of obs = 499
F( 2, 484) = 8.31
Prob > F = 0.0003
Total (centered) SS = 3335.918099 Centered R2 = 0.3971
Total (uncentered) SS = 3335.918099 Uncentered R2 = 0.3971
Residual SS = 2011.119506 Root MSE = 2.038
------------------------------------------------------------------------------
visits | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
time | .5642802 .2123745 2.66 0.008 .1469904 .98157
frfam | -.04045 .0922873 -0.44 0.661 -.2217832 .1408833
------------------------------------------------------------------------------
Underidentification test (Anderson canon. corr. LM statistic): 18.494
Chi-sq(1) P-val = 0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic): 18.628
Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
------------------------------------------------------------------------------
Sargan statistic (overidentification test of all instruments): 0.000
(equation exactly identified)
------------------------------------------------------------------------------
Instrumented: time
Included instruments: frfam
Excluded instruments: phone
Partialled-out: _cons
nb: total SS, model F and R2s are after partialling-out;
any small-sample adjustments include partialled-out
variables in regressor count K
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
ad | 12 0 12 |
female | 2 1 1 |
-----------------------------------------------------+
(dropped 1 singleton observations)
(MWFE estimator converged in 3 iterations)
HDFE Linear regression Number of obs = 499
Absorbing 2 HDFE groups F( 2, 484) = 29.02
Prob > F = 0.0000
R-squared = 0.3137
Adj R-squared = 0.2938
Within R-sq. = 0.1071
Root MSE = 2.2239
------------------------------------------------------------------------------
time | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
phone | .2705429 .0626834 4.32 0.000 .1473778 .393708
frfam | .2539231 .0619373 4.10 0.000 .1322239 .3756222
_cons | 1.406026 .2549245 5.52 0.000 .9051305 1.906921
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
ad | 12 0 12 |
female | 2 1 1 |
-----------------------------------------------------+
(1 missing value generated)
(MWFE estimator converged in 3 iterations)
HDFE Linear regression Number of obs = 499
Absorbing 2 HDFE groups F( 3, 483) = 117.11
Prob > F = 0.0000
R-squared = 0.7677
Adj R-squared = 0.7605
Within R-sq. = 0.4211
Root MSE = 1.9996
------------------------------------------------------------------------------
visits | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
time | .5642802 .2083276 2.71 0.007 .15494 .9736205
u2h_fe | .1827169 .2122987 0.86 0.390 -.2344262 .5998601
frfam | -.04045 .0905287 -0.45 0.655 -.2183288 .1374288
_cons | 3.357879 .4427337 7.58 0.000 2.487957 4.227801
------------------------------------------------------------------------------
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
ad | 12 0 12 |
female | 2 1 1 |
-----------------------------------------------------+
We can see that “ivreghdfe” gives the same result as our two step process. The standard errors are wrong. Now we do bootstrapping.
Bootstrapping standard errors
Suppose we need clustered standard errors on “ad”.
* bootstrap, suppose we need to cluster by "ad"
clear all
capture program drop ppmlhdfe_cf
program ppmlhdfe_cf, rclass
reghdfe time phone frfam , absorb(ad female) resid
predict double u2h_fe, resid
ppmlhdfe visits time u2h_fe frfam , absorb(ad female)
return scalar b_time = _b[time]
drop u2h_fe
xtset, clear
end
webuse website
xtset, clear
bootstrap r(b_time) , reps(100) seed(123) cluster(ad) idcluster(newid) : ppmlhdfe_cf
(Visits to website)
(running ppmlhdfe_cf on estimation sample)
Bootstrap replications (100): .........10.........20.........30.........40.....
> ....50.........60.........70.........80.........90.........100 done
Bootstrap results Number of obs = 499
Replications = 100
Command: ppmlhdfe_cf
_bs_1: r(b_time)
(Replications based on 12 clusters in ad)
------------------------------------------------------------------------------
| Observed Bootstrap Normal-based
| coefficient std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
_bs_1 | .0448103 .0476634 0.94 0.347 -.0486083 .1382288
------------------------------------------------------------------------------
Somehow we need “xtset clear” for each regression. I think this is due to the fact that Stata generates a new id for the bootstrapped clusters, then somehow it xtsets the id, even if you don’t use it in the regression. Therefore we’ll need to clear it every time.
How to do it with R
R has similar packages https://cran.r-project.org/web/packages/fixest/vignettes/fixest_walkthrough.html.
We can use “fixest” and “fepois” to do the same thing as “reghdfe” and “ppmlhdfe”. Again, at the moment, we need to manually bootstrap the two stages for correct standard errors.