Panel data
When we have a panel data (repeated observations over time, or observations clustered at higher level), we usually think of two choices: random effect or fixed effect? Economists usually prefers fixed effect models, since it wipes out all within unit heterogeneity. Economists do not like random effect models since it has a big assumption: the random effects need to be uncorrelated to other covariates in the model. To see this, suppose we have
\[ y_{it} = \beta_0 + \beta_1 x_{it} + c_i + \epsilon_{it} \]
Suppose we have individuals \(i=1, ... , n\) measured at time \(t=1, ..., T\). Here \(c_i\) is the unobserved time-invariant individual effects. The difference between fixed and random effects is in how they handle \(c_i\).
Fixed effect models for a linear model can be implemented by one of these two methods: with dummies of individuals, or run an OLS with de-meaned \(y\) and \(x\). These two methods are equivalent. In a non-linear model, things are more difficult, except Poisson model, other non-lienar model with dummies suffer “incidental parameter” problem. The gold-standard is to do a conditional likelihood (conditional logit for example), which “obsorbs” the fixed effects in the likelihood function, therefore it’s not necessary to estimate them. Unfortunately most non-linear models do not have such nice conditional likelihood. In that case we can only hope the bias would be small (it does get smaller when you have deeper panel, that is , number of observations per individual).
Random effect models treat \(c_i\) as part of the error term. In that case, it comes the biggest drawback: the covariates have to be uncorrelated with the error term to have a consistent estimator. Therefore in the above equation, \(x\) has to be uncorrelated with \(c_i\), which economists in general do not think it’s realistic.
Time-invariant variables
Sometimes people are interested in the effect of time-invariant variables, thus the model
\[ y_{it} = \beta_0 + \beta_1 x_{it} + c_i + \gamma z_i+ \epsilon_{it} \]
Fixed effect models cannot handle this, because \(\gamma\) is not identified because \(z_i\) is perfectly collinear with \(c_i\). Random effect can still be estimated, treating \(z_i\) simply as another covariate.
Between-within model
Usually we were told to do a Hausman test to see whether we should use fixed effect or random effect model. The basic idea is the random effect is more efficient if the assumptions are satisfied. If not, then fixed effect model is still consistent. The Hausman test is to compare the difference between the two. If the difference is small then stick with random effect. If it’s big, then fixed effect should be preferred since it’s consistent.
However, there is a between-within model (BW) that can incorporate both. Neuhaus and Kalbfleisch (1998)(https://www.ncbi.nlm.nih.gov/pubmed/9629647) introduced BW estimator,
\[ y_{it} = \beta_0 + \beta_1 (x_{it} - \bar x_i) + \beta_2 \bar x_i + c_i + \gamma z_i+ \epsilon_{it} \]
It can be shown that \(\beta_1\) is the same as the one in the fixed effect model. It is the effect of within individual deviation of \(x\) on within individual deviation of \(y\). \(\beta_2\) is the effect of mean of \(x\) on mean of \(y\), that is, the “between” effect. \(\gamma\) is the effect of time-invariant variable on the mean of \(y\).
The other specification of BW estimator is
\[ y_{it} = \beta_0 + \beta_1 x_{it} + \beta_2 \bar x_i + c_i + \gamma z_i+ \epsilon_{it} \]
This is just some transformation of the original specification, it’s the same model. \(\beta_1\) is exactly the same as before, \(\beta_2\) becomes the difference between “within” and “between” effects. This is called “contextual model”, \(\beta_2\) is the “contextual” effect. See Neuhaus and Kalbfleisch (1998)(https://doi.org/10.1017/psrm.2014.7). In this specification, \(\beta_2\) is acutally similar to a Hausman test. It shows the difference between “between” and “within”.
One advantage of BW model is that it can incorporate fixed effect models along with a random effect estimation, thus including time-invariant covariates becomes possible. A second advantage is that it can do more complicated models, such as cross-level interactions, random slopes, or other multi-level models.
The actual implementation of the simplest form of BW is easy: simply use random effect models on the above two equations.
BW model in R
R has a package “panelr”(https://panelr.jacob-long.com/articles/wbm.html) that implements various kinds of BW models. Let’s see an example.
library(panelr)
data("WageData")
wages <- panel_data(WageData, id = id, wave = t)
model1 <- wbm(lwage ~ wks + union + ms + occ | blk + fem, data = wages)
summary(model1)
## MODEL INFO:
## Entities: 595
## Time periods: 1-7
## Dependent variable: lwage
## Model type: Linear mixed effects
## Specification: within-between
##
## MODEL FIT:
## AIC = 2036.78, BIC = 2119.13
## Pseudo-R² (fixed effects) = 0.27
## Pseudo-R² (total) = 0.69
## Entity ICC = 0.57
##
## WITHIN EFFECTS:
## ----------------------------------------------------
## Est. S.E. t val. d.f. p
## ----------- ------- ------ -------- --------- ------
## wks 0.00 0.00 1.06 3566.00 0.29
## union 0.06 0.03 2.53 3566.00 0.01
## ms -0.08 0.03 -2.57 3566.00 0.01
## occ -0.08 0.02 -3.32 3566.00 0.00
## ----------------------------------------------------
##
## BETWEEN EFFECTS:
## ----------------------------------------------------------
## Est. S.E. t val. d.f. p
## ------------------ ------- ------ -------- -------- ------
## (Intercept) 6.30 0.20 30.85 588.00 0.00
## imean(wks) 0.01 0.00 2.25 588.00 0.02
## imean(union) 0.15 0.03 4.67 588.00 0.00
## imean(ms) 0.17 0.05 3.07 588.00 0.00
## imean(occ) -0.41 0.03 -13.31 588.00 0.00
## blk -0.15 0.05 -2.81 588.00 0.01
## fem -0.32 0.06 -4.96 588.00 0.00
## ----------------------------------------------------------
##
## p values calculated using Satterthwaite d.f.
##
## RANDOM EFFECTS:
## ------------------------------------
## Group Parameter Std. Dev.
## ---------- ------------- -----------
## id (Intercept) 0.2992
## Residual 0.2589
## ------------------------------------
Let’s compare this with another popular package “lfe”.
library(lfe)
model2 <- felm(lwage ~ wks + union + ms + occ | id, data = wages)
summary(model2)
##
## Call:
## felm(formula = lwage ~ wks + union + ms + occ | id, data = wages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.89500 -0.16174 0.00652 0.17060 1.94521
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## wks 0.001083 0.001019 1.063 0.287816
## union 0.064320 0.025378 2.534 0.011305 *
## ms -0.082905 0.032226 -2.573 0.010132 *
## occ -0.077507 0.023359 -3.318 0.000916 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2589 on 3566 degrees of freedom
## Multiple R-squared(full model): 0.7304 Adjusted R-squared: 0.6852
## Multiple R-squared(proj model): 0.006509 Adjusted R-squared: -0.1601
## F-statistic(full model):16.16 on 598 and 3566 DF, p-value: < 2.2e-16
## F-statistic(proj model): 5.841 on 4 and 3566 DF, p-value: 0.0001106
We can see these two gives the same fixed effect estimation. “panelr” in addition estimates the effect of “blk” and “fem” which are time-invariant. But “lfe” has an advantage, it allows you to estimate fixed effect with clustered standard errors, which I wish “panelr” can do too.
model3 <- felm(lwage ~ wks + union + ms + occ | id | 0 | id, data = wages)
summary(model3)
##
## Call:
## felm(formula = lwage ~ wks + union + ms + occ | id | 0 | id, data = wages)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.89500 -0.16174 0.00652 0.17060 1.94521
##
## Coefficients:
## Estimate Cluster s.e. t value Pr(>|t|)
## wks 0.001083 0.001438 0.754 0.451
## union 0.064320 0.044215 1.455 0.146
## ms -0.082905 0.051195 -1.619 0.105
## occ -0.077507 0.033828 -2.291 0.022 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2589 on 3566 degrees of freedom
## Multiple R-squared(full model): 0.7304 Adjusted R-squared: 0.6852
## Multiple R-squared(proj model): 0.006509 Adjusted R-squared: -0.1601
## F-statistic(full model, *iid*):16.16 on 598 and 3566 DF, p-value: < 2.2e-16
## F-statistic(proj model): 2.963 on 4 and 594 DF, p-value: 0.01928
BW model in Stata
In stata, there is no package to do BW estimator. But we can do it with “xtreg”.
webuse nlswork
xtset idcode
xtreg ln_w age, fe cluster(idcode)
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
panel variable: idcode (unbalanced)
Fixed-effects (within) regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710
R-sq: Obs per group:
within = 0.1026 min = 1
between = 0.0877 avg = 6.1
overall = 0.0774 max = 15
F(1,4709) = 884.05
corr(u_i, Xb) = 0.0314 Prob > F = 0.0000
(Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0181349 .0006099 29.73 0.000 .0169392 .0193306
_cons | 1.148214 .0177153 64.81 0.000 1.113483 1.182944
-------------+----------------------------------------------------------------
sigma_u | .40635023
sigma_e | .30349389
rho | .64192015 (fraction of variance due to u_i)
------------------------------------------------------------------------------
We then generate the mean of age and run a BW estimation.
webuse nlswork
xtset idcode
bysort idcode: center age, prefix(d) mean(m)
xtreg ln_w age mage i.race, re cluster(idcode)
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
panel variable: idcode (unbalanced)
(generated variables: dage mage)
Random-effects GLS regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710
R-sq: Obs per group:
within = 0.1026 min = 1
between = 0.1040 avg = 6.1
overall = 0.0950 max = 15
Wald chi2(4) = 1335.89
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
(Std. Err. adjusted for 4,710 clusters in idcode)
------------------------------------------------------------------------------
| Robust
ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0181349 .00061 29.73 0.000 .0169394 .0193304
mage | .0044231 .0012736 3.47 0.001 .001927 .0069192
|
race |
black | -.1190245 .0127419 -9.34 0.000 -.1439981 -.094051
other | .0974999 .0617365 1.58 0.114 -.0235014 .2185012
|
_cons | 1.037566 .0323185 32.10 0.000 .9742232 1.100909
-------------+----------------------------------------------------------------
sigma_u | .36581626
sigma_e | .30349389
rho | .59231394 (fraction of variance due to u_i)
------------------------------------------------------------------------------
In this BW model, we have the fixed effect model coefficient on age, which is .0181. The coeffcient on mage (.0044) is the “contextual effect” of between effect of age, that is, the addtional effect of between effect on logged wage. The between effect should be .0044+.0181=.0225. And we have the effect of time-invariant covariate race estimated. The advantage of using xtreg is that we have clustered standard errors implemented.
BW model in non-linear models
Paul Allison in his blog(https://statisticalhorizons.com/between-within-contextual-effects) mentioned using BW model for a binary outcome. I have not dig into the literature to see how large the bias can be using the BW , comparing to, say a conditional logit model. But if OLS is a good linear approximation of a logit model, BW model could be a good approximation with a binary outcome with panel data.