Poisson regression with many fixed effects

I have a few posts about Poisson regression for non-negative outcomes, including continous variable, count variable, or corner solution (such as non-negative outcome with many zeros). The benefits include that we can use QMLE Poisson, which means we only need to have the mean of outcome specified correctly, as an exponetiated function of covariates. There is no assumption that the distribution needs to be a Poisson distribution. When we want to model non-negative outcome with fixed effects, fixed effect Poisson should be our first choice also, because fixed effect Poisson is shown to be consistent without the assumption of the distribution of the error term, and it does not have the “incidental parameter” problem.

Stata has “xtpoisson” to do fixed effect Poisson. However, we would recommend “ppmlhdfe”. “ppmlhdfe is a Stata package that implements Poisson pseudo-maximum likelihood regressions (PPML) with multi-way fixed effects”. It is faster than xtpoisson. More importantly, it does report the predicted value correctly. I mentioned in past post that we need to be cautious when we need to use predicted values after “xtpoisson, fe”, or “xtreg, fe”. The reason is that these are models estimated after “absorbing” the fixed effects, meaning fixed effects are not actually estimated. Therefore the predicted values are without fixed effects. But in “ppmlhdfe” we get the correct predicted values with fixed effect.

On a side note, the same author has “reghdfe” as a replacement for “areg” in Stata. “reghdfe” is faster than “areg”, or “xtreg, fe”. It also allows for many levels of fixed effects. As “ppmlhdfe”, it also provides correct predicted values with fixed effects.

IV poisson with many fixed effects

Stata has a “ivpoisson” function with handles IV in a Poisson regression setting. However, it does not have the option to include fixed effects.

In the case that the endogenous variable is continous and we can use a linear model for the first stage, Wooldridge 2010 textbook (chapter 18) suggested using control function approach, which is very simple:

Run the first stage regression (reduced from for the endogous variable), and get the predicted value of the residual. Sometimes called Mundlak residual. Note in the many fixed effects case, we need to include all fixed effects in the first stage regression. In the case of stata, we would use “reghdfe” here.
Include both the endogenous variable and the Mundlak residual as additional regressors in the second stage, the original Poisson regression. Use “ppmlhdfe” here.

Note that the standard errors would be incorrect if we use the standard errors from the second stage regression. We need to use the bootstrap method to get the correct standard errors. The reason is that the Mundlak residual is generated from the first stage, we need to bootstrap the whole process to get the correct standard errors. In the case of clustered standard errors, we need to bootstrap the clusters.

Example

* control function approach
webuse website
reghdfe time phone frfam, absorb(ad female) resid
predict double u2h_fe, resid
ppmlhdfe visits time u2h_fe frfam, absorb(ad female)

(Visits to website)

(dropped 1 singleton observations)
(MWFE estimator converged in 3 iterations)

HDFE Linear regression                            Number of obs   =        499
Absorbing 2 HDFE groups                           F(   2,    484) =      29.02
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.3137
                                                  Adj R-squared   =     0.2938
                                                  Within R-sq.    =     0.1071
                                                  Root MSE        =     2.2239

------------------------------------------------------------------------------
        time | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       phone |   .2705429   .0626834     4.32   0.000     .1473778     .393708
       frfam |   .2539231   .0619373     4.10   0.000     .1322239    .3756222
       _cons |   1.406026   .2549245     5.52   0.000     .9051305    1.906921
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
          ad |        12           0          12     |
      female |         2           1           1     |
-----------------------------------------------------+

(1 missing value generated)

Iteration 1:   deviance = 3.7303e+02  eps = .         iters = 3    tol = 1.0e-0
> 4                                                                            
>    min(eta) =  -0.57  P   
Iteration 2:   deviance = 3.5849e+02  eps = 4.06e-02  iters = 3    tol = 1.0e-0
> 4                                                                            
>    min(eta) =  -0.59      
Iteration 3:   deviance = 3.5840e+02  eps = 2.46e-04  iters = 3    tol = 1.0e-0
> 4                                                                            
>    min(eta) =  -0.60      
Iteration 4:   deviance = 3.5840e+02  eps = 3.17e-08  iters = 2    tol = 1.0e-0
> 4                                                                            
>    min(eta) =  -0.60      
Iteration 5:   deviance = 3.5840e+02  eps = 9.72e-16  iters = 2    tol = 1.0e-0
> 5                                                                            
>    min(eta) =  -0.60   S  
Iteration 6:   deviance = 3.5840e+02  eps = 3.24e-16  iters = 1    tol = 1.0e-0
> 9                                                                            
>    min(eta) =  -0.60   S O
-------------------------------------------------------------------------------
> -----------------------------
(legend: p: exact partial-out   s: exact solver   h: step-halving   o: epsilon 
> below tolerance)
Converged in 6 iterations and 14 HDFE sub-iterations (tol = 1.0e-08)

HDFE PPML regression                              No. of obs      =        499
Absorbing 2 HDFE groups                           Residual df     =        483
                                                  Wald chi2(3)    =     169.93
Deviance             =  358.4027719               Prob > chi2     =     0.0000
Log pseudolikelihood = -993.8547331               Pseudo R2       =     0.2976
------------------------------------------------------------------------------
             |               Robust
      visits | Coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
        time |   .0448103   .0448665     1.00   0.318    -.0431265    .1327471
      u2h_fe |   .0765144   .0459404     1.67   0.096    -.0135272     .166556
       frfam |   .0004108   .0205161     0.02   0.984       -.0398    .0406215
       _cons |   1.510566   .1033122    14.62   0.000     1.308078    1.713054
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
          ad |        12           0          12     |
      female |         2           1           1     |
-----------------------------------------------------+

In this example, “time” is the endognous variable, and “phone” is the instrument. We first run the first stage regression with “reghdfe”, and get the Mundlak residual “u2h_fe”. Then we include “time” and “u2h_fe” in the second stage regression with “ppmlhdfe”. “ad” and “female” are two fixed effects. The standard errors are incorrect, we need to use bootstrap to get the correct standard errors.

To verify our control function approach, we do the same with a linear model. By the way, for linear model and Poisson model, control function approach works, in the sense you can just put in predicted residual in the second stage and that takes care of the endogneity adjustment. There is a package “ivreghdfe” we use to verify our two step process.

clear all
webuse website
ivreghdfe visits frfam (time=phone), absorb(ad female)
* now the control function apparoach
reghdfe time phone frfam , absorb(ad female) resid
predict u2h_fe, resid
reghdfe visits time u2h_fe frfam , absorb(ad female)

(Visits to website)

(dropped 1 singleton observations)
(MWFE estimator converged in 3 iterations)

IV (2SLS) estimation
--------------------

Estimates efficient for homoskedasticity only
Statistics consistent for homoskedasticity only

                                                      Number of obs =      499
                                                      F(  2,   484) =     8.31
                                                      Prob > F      =   0.0003
Total (centered) SS     =  3335.918099                Centered R2   =   0.3971
Total (uncentered) SS   =  3335.918099                Uncentered R2 =   0.3971
Residual SS             =  2011.119506                Root MSE      =    2.038

------------------------------------------------------------------------------
      visits | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        time |   .5642802   .2123745     2.66   0.008     .1469904      .98157
       frfam |    -.04045   .0922873    -0.44   0.661    -.2217832    .1408833
------------------------------------------------------------------------------
Underidentification test (Anderson canon. corr. LM statistic):          18.494
                                                   Chi-sq(1) P-val =    0.0000
------------------------------------------------------------------------------
Weak identification test (Cragg-Donald Wald F statistic):               18.628
Stock-Yogo weak ID test critical values: 10% maximal IV size             16.38
                                         15% maximal IV size              8.96
                                         20% maximal IV size              6.66
                                         25% maximal IV size              5.53
Source: Stock-Yogo (2005).  Reproduced by permission.
------------------------------------------------------------------------------
Sargan statistic (overidentification test of all instruments):           0.000
                                                 (equation exactly identified)
------------------------------------------------------------------------------
Instrumented:         time
Included instruments: frfam
Excluded instruments: phone
Partialled-out:       _cons
                      nb: total SS, model F and R2s are after partialling-out;
                          any small-sample adjustments include partialled-out
                          variables in regressor count K
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
          ad |        12           0          12     |
      female |         2           1           1     |
-----------------------------------------------------+

(dropped 1 singleton observations)
(MWFE estimator converged in 3 iterations)

HDFE Linear regression                            Number of obs   =        499
Absorbing 2 HDFE groups                           F(   2,    484) =      29.02
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.3137
                                                  Adj R-squared   =     0.2938
                                                  Within R-sq.    =     0.1071
                                                  Root MSE        =     2.2239

------------------------------------------------------------------------------
        time | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       phone |   .2705429   .0626834     4.32   0.000     .1473778     .393708
       frfam |   .2539231   .0619373     4.10   0.000     .1322239    .3756222
       _cons |   1.406026   .2549245     5.52   0.000     .9051305    1.906921
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
          ad |        12           0          12     |
      female |         2           1           1     |
-----------------------------------------------------+

(1 missing value generated)

(MWFE estimator converged in 3 iterations)

HDFE Linear regression                            Number of obs   =        499
Absorbing 2 HDFE groups                           F(   3,    483) =     117.11
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.7677
                                                  Adj R-squared   =     0.7605
                                                  Within R-sq.    =     0.4211
                                                  Root MSE        =     1.9996

------------------------------------------------------------------------------
      visits | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
        time |   .5642802   .2083276     2.71   0.007       .15494    .9736205
      u2h_fe |   .1827169   .2122987     0.86   0.390    -.2344262    .5998601
       frfam |    -.04045   .0905287    -0.45   0.655    -.2183288    .1374288
       _cons |   3.357879   .4427337     7.58   0.000     2.487957    4.227801
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
          ad |        12           0          12     |
      female |         2           1           1     |
-----------------------------------------------------+

We can see that “ivreghdfe” gives the same result as our two step process. The standard errors are wrong. Now we do bootstrapping.

Bootstrapping standard errors

Suppose we need clustered standard errors on “ad”.

* bootstrap, suppose we need to cluster by "ad"
clear all
capture program drop ppmlhdfe_cf
 program ppmlhdfe_cf, rclass
  reghdfe time phone frfam , absorb(ad female) resid
  predict double u2h_fe, resid
  ppmlhdfe visits time u2h_fe frfam , absorb(ad female)
  return scalar b_time = _b[time]
  drop u2h_fe
  xtset, clear
 end

webuse website

xtset, clear
bootstrap r(b_time) , reps(100) seed(123) cluster(ad) idcluster(newid) : ppmlhdfe_cf

(Visits to website)


(running ppmlhdfe_cf on estimation sample)

Bootstrap replications (100): .........10.........20.........30.........40.....
> ....50.........60.........70.........80.........90.........100 done

Bootstrap results                                          Number of obs = 499
                                                           Replications  = 100

      Command: ppmlhdfe_cf
        _bs_1: r(b_time)

                                     (Replications based on 12 clusters in ad)
------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             | coefficient  std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
       _bs_1 |   .0448103   .0476634     0.94   0.347    -.0486083    .1382288
------------------------------------------------------------------------------

Somehow we need “xtset clear” for each regression. I think this is due to the fact that Stata generates a new id for the bootstrapped clusters, then somehow it xtsets the id, even if you don’t use it in the regression. Therefore we’ll need to clear it every time.

How to do it with R

We can use “fixest” and “fepois” to do the same thing as “reghdfe” and “ppmlhdfe”. Again, at the moment, we need to manually bootstrap the two stages for correct standard errors.

IV in Fixed effect Poisson with many fixed effects

Poisson regression with many fixed effects

IV poisson with many fixed effects

Example

Bootstrapping standard errors

How to do it with R