Panel data
When we have a panel data (repeated observations over time, or observations clustered at higher level), we usually think of two choices: random effect or fixed effect? Economists usually prefers fixed effect models, since it wipes out all within unit heterogeneity. Economists do not like random effect models since it has a big assumption: the random effects need to be uncorrelated to other covariates in the model. To see this, suppose we have
Suppose we have individuals measured at time . Here is the unobserved time-invariant individual effects. The difference between fixed and random effects is in how they handle .
Fixed effect models for a linear model can be implemented by one of these two methods: with dummies of individuals, or run an OLS with de-meaned and . These two methods are equivalent. In a non-linear model, things are more difficult, except Poisson model, other non-lienar model with dummies suffer “incidental parameter” problem. The gold-standard is to do a conditional likelihood (conditional logit for example), which “obsorbs” the fixed effects in the likelihood function, therefore it’s not necessary to estimate them. Unfortunately most non-linear models do not have such nice conditional likelihood. In that case we can only hope the bias would be small (it does get smaller when you have deeper panel, that is , number of observations per individual).
Random effect models treat as part of the error term. In that case, it comes the biggest drawback: the covariates have to be uncorrelated with the error term to have a consistent estimator. Therefore in the above equation, has to be uncorrelated with , which economists in general do not think it’s realistic.
Time-invariant variables
Sometimes people are interested in the effect of time-invariant variables, thus the model
Fixed effect models cannot handle this, because is not identified because is perfectly collinear with . Random effect can still be estimated, treating simply as another covariate.
Between-within model
Usually we were told to do a Hausman test to see whether we should use fixed effect or random effect model. The basic idea is the random effect is more efficient if the assumptions are satisfied. If not, then fixed effect model is still consistent. The Hausman test is to compare the difference between the two. If the difference is small then stick with random effect. If it’s big, then fixed effect should be preferred since it’s consistent.
However, there is a between-within model (BW) that can incorporate both. Neuhaus and Kalbfleisch (1998)(https://www.ncbi.nlm.nih.gov/pubmed/9629647) introduced BW estimator,
It can be shown that is the same as the one in the fixed effect model. It is the effect of within individual deviation of on within individual deviation of . is the effect of mean of on mean of , that is, the “between” effect. is the effect of time-invariant variable on the mean of .
The other specification of BW estimator is
This is just some transformation of the original specification, it’s the same model. is exactly the same as before, becomes the difference between “within” and “between” effects. This is called “contextual model”, is the “contextual” effect. See Neuhaus and Kalbfleisch (1998)(https://doi.org/10.1017/psrm.2014.7). In this specification, is acutally similar to a Hausman test. It shows the difference between “between” and “within”.
One advantage of BW model is that it can incorporate fixed effect models along with a random effect estimation, thus including time-invariant covariates becomes possible. A second advantage is that it can do more complicated models, such as cross-level interactions, random slopes, or other multi-level models.
The actual implementation of the simplest form of BW is easy: simply use random effect models on the above two equations.
BW model in R
R has a package “panelr”(https://panelr.jacob-long.com/articles/wbm.html) that implements various kinds of BW models. Let’s see an example.
library(panelr)
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model1 <- wbm(lwage ~ wks + union + ms + occ | blk + fem, data = wages)
summary(model1)## MODEL INFO:
## Entities: 595
## Time periods: 1-7
## Dependent variable: lwage
## Model type: Linear mixed effects
## Specification: within-between
## 
## MODEL FIT:
## AIC = 2036.78, BIC = 2119.13
## Pseudo-R² (fixed effects) = 0.27
## Pseudo-R² (total) = 0.69
## Entity ICC = 0.57
## 
## WITHIN EFFECTS:
## ----------------------------------------------------
##                Est.   S.E.   t val.      d.f.      p
## ----------- ------- ------ -------- --------- ------
## wks            0.00   0.00     1.06   3566.00   0.29
## union          0.06   0.03     2.53   3566.00   0.01
## ms            -0.08   0.03    -2.57   3566.00   0.01
## occ           -0.08   0.02    -3.32   3566.00   0.00
## ----------------------------------------------------
## 
## BETWEEN EFFECTS:
## ----------------------------------------------------------
##                       Est.   S.E.   t val.     d.f.      p
## ------------------ ------- ------ -------- -------- ------
## (Intercept)           6.30   0.20    30.85   588.00   0.00
## imean(wks)            0.01   0.00     2.25   588.00   0.02
## imean(union)          0.15   0.03     4.67   588.00   0.00
## imean(ms)             0.17   0.05     3.07   588.00   0.00
## imean(occ)           -0.41   0.03   -13.31   588.00   0.00
## blk                  -0.15   0.05    -2.81   588.00   0.01
## fem                  -0.32   0.06    -4.96   588.00   0.00
## ----------------------------------------------------------
## 
## p values calculated using Satterthwaite d.f.
##  
## RANDOM EFFECTS:
## ------------------------------------
##   Group      Parameter    Std. Dev. 
## ---------- ------------- -----------
##     id      (Intercept)    0.2992   
##  Residual                  0.2589   
## ------------------------------------Let’s compare this with another popular package “lfe”.
library(lfe)
model2 <- felm(lwage ~ wks + union + ms + occ | id, data = wages)
summary(model2)## 
## Call:
##    felm(formula = lwage ~ wks + union + ms + occ | id, data = wages) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.89500 -0.16174  0.00652  0.17060  1.94521 
## 
## Coefficients:
##        Estimate Std. Error t value Pr(>|t|)    
## wks    0.001083   0.001019   1.063 0.287816    
## union  0.064320   0.025378   2.534 0.011305 *  
## ms    -0.082905   0.032226  -2.573 0.010132 *  
## occ   -0.077507   0.023359  -3.318 0.000916 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2589 on 3566 degrees of freedom
## Multiple R-squared(full model): 0.7304   Adjusted R-squared: 0.6852 
## Multiple R-squared(proj model): 0.006509   Adjusted R-squared: -0.1601 
## F-statistic(full model):16.16 on 598 and 3566 DF, p-value: < 2.2e-16 
## F-statistic(proj model): 5.841 on 4 and 3566 DF, p-value: 0.0001106We can see these two gives the same fixed effect estimation. “panelr” in addition estimates the effect of “blk” and “fem” which are time-invariant. But “lfe” has an advantage, it allows you to estimate fixed effect with clustered standard errors, which I wish “panelr” can do too.
model3 <- felm(lwage ~ wks + union + ms + occ | id | 0 | id, data = wages)
summary(model3)## 
## Call:
##    felm(formula = lwage ~ wks + union + ms + occ | id | 0 | id,      data = wages) 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.89500 -0.16174  0.00652  0.17060  1.94521 
## 
## Coefficients:
##        Estimate Cluster s.e. t value Pr(>|t|)  
## wks    0.001083     0.001438   0.754    0.451  
## union  0.064320     0.044215   1.455    0.146  
## ms    -0.082905     0.051195  -1.619    0.105  
## occ   -0.077507     0.033828  -2.291    0.022 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2589 on 3566 degrees of freedom
## Multiple R-squared(full model): 0.7304   Adjusted R-squared: 0.6852 
## Multiple R-squared(proj model): 0.006509   Adjusted R-squared: -0.1601 
## F-statistic(full model, *iid*):16.16 on 598 and 3566 DF, p-value: < 2.2e-16 
## F-statistic(proj model): 2.963 on 4 and 594 DF, p-value: 0.01928BW model in Stata
In stata, there is no package to do BW estimator. But we can do it with “xtreg”.
webuse nlswork
xtset idcode
xtreg ln_w age, fe cluster(idcode)
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
       panel variable:  idcode (unbalanced)
Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710
R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.0877                                         avg =        6.1
     overall = 0.0774                                         max =         15
                                                F(1,4709)         =     884.05
corr(u_i, Xb)  = 0.0314                         Prob > F          =     0.0000
                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0006099    29.73   0.000     .0169392    .0193306
       _cons |   1.148214   .0177153    64.81   0.000     1.113483    1.182944
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------We then generate the mean of age and run a BW estimation.
webuse nlswork
xtset idcode
bysort idcode: center age, prefix(d) mean(m)
xtreg ln_w age mage i.race, re cluster(idcode)(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)
       panel variable:  idcode (unbalanced)
(generated variables: dage mage)
Random-effects GLS regression                   Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710
R-sq:                                           Obs per group:
     within  = 0.1026                                         min =          1
     between = 0.1040                                         avg =        6.1
     overall = 0.0950                                         max =         15
                                                Wald chi2(4)      =    1335.89
corr(u_i, X)   = 0 (assumed)                    Prob > chi2       =     0.0000
                             (Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0181349     .00061    29.73   0.000     .0169394    .0193304
        mage |   .0044231   .0012736     3.47   0.001      .001927    .0069192
             |
        race |
      black  |  -.1190245   .0127419    -9.34   0.000    -.1439981    -.094051
      other  |   .0974999   .0617365     1.58   0.114    -.0235014    .2185012
             |
       _cons |   1.037566   .0323185    32.10   0.000     .9742232    1.100909
-------------+----------------------------------------------------------------
     sigma_u |  .36581626
     sigma_e |  .30349389
         rho |  .59231394   (fraction of variance due to u_i)
------------------------------------------------------------------------------In this BW model, we have the fixed effect model coefficient on age, which is .0181. The coeffcient on mage (.0044) is the “contextual effect” of between effect of age, that is, the addtional effect of between effect on logged wage. The between effect should be .0044+.0181=.0225. And we have the effect of time-invariant covariate race estimated. The advantage of using xtreg is that we have clustered standard errors implemented.
BW model in non-linear models
Paul Allison in his blog(https://statisticalhorizons.com/between-within-contextual-effects) mentioned using BW model for a binary outcome. I have not dig into the literature to see how large the bias can be using the BW , comparing to, say a conditional logit model. But if OLS is a good linear approximation of a logit model, BW model could be a good approximation with a binary outcome with panel data.