Panel data
When we have a panel data (repeated observations over time, or observations clustered at higher level), we usually think of two choices: random effect or fixed effect? Economists usually prefers fixed effect models, since it wipes out all within unit heterogeneity. Economists do not like random effect models since it has a big assumption: the random effects need to be uncorrelated to other covariates in the model. To see this, suppose we have
\[ y_{it} = \beta_0 + \beta_1 x_{it} + c_i + \epsilon_{it} \]
Suppose we have individuals \(i=1, ... , n\) measured at time \(t=1, ..., T\). Here \(c_i\) is the unobserved timeinvariant individual effects. The difference between fixed and random effects is in how they handle \(c_i\).
Fixed effect models for a linear model can be implemented by one of these two methods: with dummies of individuals, or run an OLS with demeaned \(y\) and \(x\). These two methods are equivalent. In a nonlinear model, things are more difficult, except Poisson model, other nonlienar model with dummies suffer “incidental parameter” problem. The goldstandard is to do a conditional likelihood (conditional logit for example), which “obsorbs” the fixed effects in the likelihood function, therefore it’s not necessary to estimate them. Unfortunately most nonlinear models do not have such nice conditional likelihood. In that case we can only hope the bias would be small (it does get smaller when you have deeper panel, that is , number of observations per individual).
Random effect models treat \(c_i\) as part of the error term. In that case, it comes the biggest drawback: the covariates have to be uncorrelated with the error term to have a consistent estimator. Therefore in the above equation, \(x\) has to be uncorrelated with \(c_i\), which economists in general do not think it’s realistic.
Timeinvariant variables
Sometimes people are interested in the effect of timeinvariant variables, thus the model
\[ y_{it} = \beta_0 + \beta_1 x_{it} + c_i + \gamma z_i+ \epsilon_{it} \]
Fixed effect models cannot handle this, because \(\gamma\) is not identified because \(z_i\) is perfectly collinear with \(c_i\). Random effect can still be estimated, treating \(z_i\) simply as another covariate.
Betweenwithin model
Usually we were told to do a Hausman test to see whether we should use fixed effect or random effect model. The basic idea is the random effect is more efficient if the assumptions are satisfied. If not, then fixed effect model is still consistent. The Hausman test is to compare the difference between the two. If the difference is small then stick with random effect. If it’s big, then fixed effect should be preferred since it’s consistent.
However, there is a betweenwithin model (BW) that can incorporate both. Neuhaus and Kalbfleisch (1998)(https://www.ncbi.nlm.nih.gov/pubmed/9629647) introduced BW estimator,
\[ y_{it} = \beta_0 + \beta_1 (x_{it}  \bar x_i) + \beta_2 \bar x_i + c_i + \gamma z_i+ \epsilon_{it} \]
It can be shown that \(\beta_1\) is the same as the one in the fixed effect model. It is the effect of within individual deviation of \(x\) on within individual deviation of \(y\). \(\beta_2\) is the effect of mean of \(x\) on mean of \(y\), that is, the “between” effect. \(\gamma\) is the effect of timeinvariant variable on the mean of \(y\).
The other specification of BW estimator is
\[ y_{it} = \beta_0 + \beta_1 x_{it} + \beta_2 \bar x_i + c_i + \gamma z_i+ \epsilon_{it} \]
This is just some transformation of the original specification, it’s the same model. \(\beta_1\) is exactly the same as before, \(\beta_2\) becomes the difference between “within” and “between” effects. This is called “contextual model”, \(\beta_2\) is the “contextual” effect. See Neuhaus and Kalbfleisch (1998)(https://doi.org/10.1017/psrm.2014.7). In this specification, \(\beta_2\) is acutally similar to a Hausman test. It shows the difference between “between” and “within”.
One advantage of BW model is that it can incorporate fixed effect models along with a random effect estimation, thus including timeinvariant covariates becomes possible. A second advantage is that it can do more complicated models, such as crosslevel interactions, random slopes, or other multilevel models.
The actual implementation of the simplest form of BW is easy: simply use random effect models on the above two equations.
BW model in R
R has a package “panelr”(https://panelr.jacoblong.com/articles/wbm.html) that implements various kinds of BW models. Let’s see an example.
library(panelr)
data("WageData")
wages < panel_data(WageData, id = id, wave = t)
model1 < wbm(lwage ~ wks + union + ms + occ  blk + fem, data = wages)
summary(model1)
## MODEL INFO:
## Entities: 595
## Time periods: 17
## Dependent variable: lwage
## Model type: Linear mixed effects
## Specification: withinbetween
##
## MODEL FIT:
## AIC = 2036.78, BIC = 2119.13
## PseudoR² (fixed effects) = 0.27
## PseudoR² (total) = 0.69
## Entity ICC = 0.57
##
## WITHIN EFFECTS:
## 
## Est. S.E. t val. d.f. p
##      
## wks 0.00 0.00 1.06 3566.00 0.29
## union 0.06 0.03 2.53 3566.00 0.01
## ms 0.08 0.03 2.57 3566.00 0.01
## occ 0.08 0.02 3.32 3566.00 0.00
## 
##
## BETWEEN EFFECTS:
## 
## Est. S.E. t val. d.f. p
##      
## (Intercept) 6.30 0.20 30.85 588.00 0.00
## imean(wks) 0.01 0.00 2.25 588.00 0.02
## imean(union) 0.15 0.03 4.67 588.00 0.00
## imean(ms) 0.17 0.05 3.07 588.00 0.00
## imean(occ) 0.41 0.03 13.31 588.00 0.00
## blk 0.15 0.05 2.81 588.00 0.01
## fem 0.32 0.06 4.96 588.00 0.00
## 
##
## p values calculated using Satterthwaite d.f.
##
## RANDOM EFFECTS:
## 
## Group Parameter Std. Dev.
##   
## id (Intercept) 0.2992
## Residual 0.2589
## 
Let’s compare this with another popular package “lfe”.
library(lfe)
model2 < felm(lwage ~ wks + union + ms + occ  id, data = wages)
summary(model2)
##
## Call:
## felm(formula = lwage ~ wks + union + ms + occ  id, data = wages)
##
## Residuals:
## Min 1Q Median 3Q Max
## 1.89500 0.16174 0.00652 0.17060 1.94521
##
## Coefficients:
## Estimate Std. Error t value Pr(>t)
## wks 0.001083 0.001019 1.063 0.287816
## union 0.064320 0.025378 2.534 0.011305 *
## ms 0.082905 0.032226 2.573 0.010132 *
## occ 0.077507 0.023359 3.318 0.000916 ***
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2589 on 3566 degrees of freedom
## Multiple Rsquared(full model): 0.7304 Adjusted Rsquared: 0.6852
## Multiple Rsquared(proj model): 0.006509 Adjusted Rsquared: 0.1601
## Fstatistic(full model):16.16 on 598 and 3566 DF, pvalue: < 2.2e16
## Fstatistic(proj model): 5.841 on 4 and 3566 DF, pvalue: 0.0001106
We can see these two gives the same fixed effect estimation. “panelr” in addition estimates the effect of “blk” and “fem” which are timeinvariant. But “lfe” has an advantage, it allows you to estimate fixed effect with clustered standard errors, which I wish “panelr” can do too.
model3 < felm(lwage ~ wks + union + ms + occ  id  0  id, data = wages)
summary(model3)
##
## Call:
## felm(formula = lwage ~ wks + union + ms + occ  id  0  id, data = wages)
##
## Residuals:
## Min 1Q Median 3Q Max
## 1.89500 0.16174 0.00652 0.17060 1.94521
##
## Coefficients:
## Estimate Cluster s.e. t value Pr(>t)
## wks 0.001083 0.001438 0.754 0.451
## union 0.064320 0.044215 1.455 0.146
## ms 0.082905 0.051195 1.619 0.105
## occ 0.077507 0.033828 2.291 0.022 *
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2589 on 3566 degrees of freedom
## Multiple Rsquared(full model): 0.7304 Adjusted Rsquared: 0.6852
## Multiple Rsquared(proj model): 0.006509 Adjusted Rsquared: 0.1601
## Fstatistic(full model, *iid*):16.16 on 598 and 3566 DF, pvalue: < 2.2e16
## Fstatistic(proj model): 2.963 on 4 and 594 DF, pvalue: 0.01928
BW model in Stata
In stata, there is no package to do BW estimator. But we can do it with “xtreg”.
webuse nlswork
xtset idcode
xtreg ln_w age, fe cluster(idcode)
(National Longitudinal Survey. Young Women 1426 years of age in 1968)
panel variable: idcode (unbalanced)
Fixedeffects (within) regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710
Rsq: Obs per group:
within = 0.1026 min = 1
between = 0.0877 avg = 6.1
overall = 0.0774 max = 15
F(1,4709) = 884.05
corr(u_i, Xb) = 0.0314 Prob > F = 0.0000
(Std. Err. adjusted for 4,710 clusters in idcode)

 Robust
ln_wage  Coef. Std. Err. t P>t [95% Conf. Interval]
+
age  .0181349 .0006099 29.73 0.000 .0169392 .0193306
_cons  1.148214 .0177153 64.81 0.000 1.113483 1.182944
+
sigma_u  .40635023
sigma_e  .30349389
rho  .64192015 (fraction of variance due to u_i)

We then generate the mean of age and run a BW estimation.
webuse nlswork
xtset idcode
bysort idcode: center age, prefix(d) mean(m)
xtreg ln_w age mage i.race, re cluster(idcode)
(National Longitudinal Survey. Young Women 1426 years of age in 1968)
panel variable: idcode (unbalanced)
(generated variables: dage mage)
Randomeffects GLS regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710
Rsq: Obs per group:
within = 0.1026 min = 1
between = 0.1040 avg = 6.1
overall = 0.0950 max = 15
Wald chi2(4) = 1335.89
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
(Std. Err. adjusted for 4,710 clusters in idcode)

 Robust
ln_wage  Coef. Std. Err. z P>z [95% Conf. Interval]
+
age  .0181349 .00061 29.73 0.000 .0169394 .0193304
mage  .0044231 .0012736 3.47 0.001 .001927 .0069192

race 
black  .1190245 .0127419 9.34 0.000 .1439981 .094051
other  .0974999 .0617365 1.58 0.114 .0235014 .2185012

_cons  1.037566 .0323185 32.10 0.000 .9742232 1.100909
+
sigma_u  .36581626
sigma_e  .30349389
rho  .59231394 (fraction of variance due to u_i)

In this BW model, we have the fixed effect model coefficient on age, which is .0181. The coeffcient on mage (.0044) is the “contextual effect” of between effect of age, that is, the addtional effect of between effect on logged wage. The between effect should be .0044+.0181=.0225. And we have the effect of timeinvariant covariate race estimated. The advantage of using xtreg is that we have clustered standard errors implemented.
BW model in nonlinear models
Paul Allison in his blog(https://statisticalhorizons.com/betweenwithincontextualeffects) mentioned using BW model for a binary outcome. I have not dig into the literature to see how large the bias can be using the BW , comparing to, say a conditional logit model. But if OLS is a good linear approximation of a logit model, BW model could be a good approximation with a binary outcome with panel data.