Bartik Instrument
Bartik instrument is a method to estimate the causal effect of a treatment on an outcome. The method is developed by Bartik (1991). The method is also known as the shift share IV method.
We follow Goldsmith-Pinkham, Sorkin, and Swift (2021) (GPSS) to describe the method. Let’s say we are interested in estimating the elasiticity of employment on wage growth.
\[ y_{lt} = D_{lt} \rho + \beta_0 x_{lt} + \epsilon_{lt}\]
\(y_l\) is the wage growth in location \(l\), \(x_l\) is the employment growth rate. \(\beta_0\) is the inverse elasticity of labor supply.
The usual challenge is that \(x_l\) is endogenous. There could be factors that are driving both wage growth and employment growth at location \(l\). Note that I am omitting the time subscript for simplicity.
There are two accounting identities:
\[ x_{lt} = Z_{lt} G_{lt} = \sum_{k=1}^K z_{lkt} g_{lkt} \] and \[ g_{lkt} = g_{kt} + \tilde{g}_{lkt} \]
Suppose in location \(l\) there are \(K\) industries. Then the employment growth can be decomposed to the sum of industry growth \(g_{lkt}\), with \(z\)’s as the industry shares.
The second identity is to say the this industry growh rate at this location can be decomposed to a national industry rate and a location-industry rate. The idea of Bartik instrument is to use the product of initial industry location share and industry growth rate as the instrument.
\[ B_{lt} = Z_{l0} G_{t} = \sum_{k=1}^K z_{lk0} g_{kt} \]
The intuition is that the initial shares are exogenous, as it’s a level variable. And the industry growth rate is exogenous, since it’s national, not location related.
Bartik Instrument in special cases
Two industries and one period
GPSS goes through the simpliest case: two industries and one period.
\[ B_l = z_{l1} g_1 + z_{l2} g_2 \]
Since there are only two industries, \(z_{l1} = 1 - z_{l2}\). Then we can rewrite the Bartik instrument as
\[ B_l = g_2 + (g_1 - g_2) z_{l1} \]
From this simplist case, we see that the instrument has only \(z_{l1}\), that is related to location. If the industry share can be considered exogenous, then the instrument is valid. We have to argue that the industry share is exogenous. In this case, the instrument is a linear function of the industry share. Therefore, the Bartik instrument is the same as using the industry share as an instrument.
GPSS shows that just-identified two-stage least squares using a Bartik instrument can in fact be written as GMM with an overidentified set of instruments, where the weight matrix is the outer product of national growth rates. That is, Bartik estimator is equivalent to a GMM estimator, with each location share as one instrument, and the weight matrix is the outer product of national growth rates.
With \(K\) industries and \(T\) time periods, it’s basically using \(K \times T\) instruments. Still, Bartik instrument is the same as GMM with all these instruments.
Rotemberg weights
GPSS decompose the Bartik estimator:
\[ \beta_{Bartik} = \sum_{k} \hat \alpha_k \hat \beta_k\] where \[ \hat \beta_k = (Z'_k X^{\perp})^{-1} Z'_k Y^{\perp}\] and \[ \hat \alpha_k = \frac{g_k Z'_k X^{\perp}}{\sum_{k'} g_{k'} Z'_{k'} X^{\perp}}\]
Here \(X^{\perp}\) and \(Y^{\perp}\) are the residualized \(X\) and \(Y\), from the regression of \(X\) and \(Y\) on all other variables (controls).
Rotemberg weights provide a measure of how important each share is. Each weight states how much bias the overall estimate would have if a given instrument (industry) were biased by \(\alpha_k\) percent. Weights can be very useful to see which industries are important, and thus whether exogeneity assumptions are plausible for instruments with largest weights.
Simulations
Simulations codes are adopted from Kohei Kawaguchi’s lecture notes. I modified slightly to fit different scenarios.
Here we simulate a case with 4 industries and 4 periods. We have 1000 locations, 4 periods, and 4 industries. The true \(\beta\) (effect of \(x\) on \(y\)) is 1.
library(ggplot2)
library(tidyverse)
library(modelsummary)
library(kableExtra)
library(estimatr)
set.seed(6)
L <- 1000
T <- 4
K <- 4
beta <- 1
sigma <-c(1, 1, 1)
z_lk0 <- expand_grid(
l = 1:L,
k = 1:K
) |>
mutate(z_lk0 = runif(length(l)))
g_kt <- expand_grid(
t = 1:T,
k = 1:K
) |>
mutate(g_kt = rnorm(t, 0, .1))
df <- expand_grid(
l = 1:L,
t = 1:T,
k = 1:K
) |>
left_join(z_lk0, by = c("l", "k")) |>
left_join(g_kt, by = c("k", "t")) |>
mutate(z_lkt = z_lk0 + sigma[1] * runif(length(l)),
g_lkt = g_kt + sigma[2] * rnorm(length(l),0, .1)
)
df <- df %>%
group_by(t, l) |>
mutate(
z_lk0 = z_lk0 / sum(z_lk0),
z_lkt = (z_lkt / sum(z_lkt)),
x_lt = sum(z_lkt * g_lkt)
)
df
## # A tibble: 16,000 × 8
## # Groups: t, l [4,000]
## l t k z_lk0 g_kt z_lkt g_lkt x_lt
## <int> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 0.277 0.130 0.252 0.115 0.0400
## 2 1 1 2 0.428 0.146 0.302 0.291 0.0400
## 3 1 1 3 0.121 -0.165 0.321 -0.231 0.0400
## 4 1 1 4 0.174 -0.110 0.126 -0.0185 0.0400
## 5 1 2 1 0.277 -0.0289 0.160 -0.00619 0.0402
## 6 1 2 2 0.428 -0.00130 0.356 0.0560 0.0402
## 7 1 2 3 0.121 0.0433 0.272 -0.0590 0.0402
## 8 1 2 4 0.174 0.0840 0.212 0.176 0.0402
## 9 1 3 1 0.277 -0.101 0.237 -0.0361 -0.0738
## 10 1 3 2 0.428 -0.152 0.367 -0.126 -0.0738
## # ℹ 15,990 more rows
This assumes we observe the industry shares at time 0 (\(z_{lk0}\)), and the industry growth rates (\(g_{kt}\)). We also observe the industry shares at time \(t\) (\(z_{lkt}\)). The employment growth rate at location \(l\) and time \(t\) (\(g_{lkt}\)) is a function of \(g_{kt}\) plus some noise. We can calculate the \(x_{lt}\), which is the weighted sum of industry growth rates.
Next we generate the outcome at \(l\), \(t\) level. Outcome (wage growth) is a linear function of \(x\) (employment growth) at \(l\), \(t\) level.
df_lt <- df |>
mutate(g_tilde_lkt = g_lkt - g_kt) |>
group_by(l, t) |>
summarise(across(c(x_lt, g_tilde_lkt), mean), .groups = "drop") |>
ungroup() |>
mutate(y_lt = beta * x_lt + sigma[3] * g_tilde_lkt)
If we run OLS on raw data, we’ll get biased results.
result_ols <-
df_lt |>
lm(formula = y_lt ~ x_lt)
modelsummary(result_ols)
(1) | |
---|---|
(Intercept) | 0.000 |
(0.001) | |
x_lt | 1.336 |
(0.008) | |
Num.Obs. | 4000 |
R2 | 0.884 |
R2 Adj. | 0.884 |
AIC | -14059.9 |
BIC | -14041.0 |
Log.Lik. | 7032.945 |
F | 30603.441 |
RMSE | 0.04 |
Now we generate the Bartik instrument. We calculate the Bartik instrument as the weighted sum of \(z_{lk0}\) and \(g_{kt}\).
b_lt <- df |>
group_by(l, t) |>
summarise(b_lt = sum(z_lk0 * g_kt)) |>
ungroup()
df_lt <- df_lt |>
left_join(b_lt, by = c("l", "t"))
result_first_stage <- lm(formula = x_lt ~ b_lt, data = df_lt)
modelsummary(result_first_stage)
(1) | |
---|---|
(Intercept) | -0.001 |
(0.001) | |
b_lt | 0.919 |
(0.013) | |
Num.Obs. | 4000 |
R2 | 0.573 |
R2 Adj. | 0.573 |
AIC | -11638.5 |
BIC | -11619.6 |
Log.Lik. | 5822.262 |
F | 5361.487 |
RMSE | 0.06 |
result_tsls <- df_lt |>
estimatr::iv_robust(formula = y_lt ~ x_lt | b_lt)
modelsummary::modelsummary(result_tsls)
(1) | |
---|---|
(Intercept) | 0.000 |
(0.001) | |
x_lt | 0.990 |
(0.012) | |
Num.Obs. | 4000 |
R2 | 0.825 |
R2 Adj. | 0.825 |
AIC | -12398.4 |
BIC | -12379.5 |
RMSE | 0.05 |
Calculate Rotemberg weights
Let’s do IV using Bartik instruments. No controls here. We use matrix operation to recover the IV estimates, which is the same as the tsls function in R.
B = as.matrix(cbind(1,df_lt$b_lt))
Y = as.matrix(df_lt$y_lt)
X = as.matrix(cbind(1,df_lt$x_lt))
iv = round(solve(t(B)%*%X)%*%t(B)%*%Y,3)
## Label and organize results into a data frame
beta.hat = as.data.frame(cbind(c("Intercept","x"),iv))
names(beta.hat) = c("Coeff.","Est")
beta.hat
## Coeff. Est
## 1 Intercept 0
## 2 x 0.99
Now we calculate the Rotemberg weights.
# This code is the R version of Paul Goldsmith-Pinkham's R code for Rotemberg weights on
# github; which is actually a call from a C++ code by jjchern.
# bw.R calls ComputeAlphaBeta.cpp
# dimensions: G is KTx1, Z is LTxKT, Y is LTx1, X is LTx1
# so B=ZG is LTx1
# Here I learned the dimensions from the ADH example in Paul's github.
# in ADH example, czone(L) is 722, year(T) is 2, ind(K) is 390.
# so G is 780x1, Z is 1444x780, Y is 1444x1, X is 1444x1
y_lt <- df_lt |>
select(l,t,y_lt)
Y <- as.matrix(y_lt[,3])
x_lt <- df_lt |>
select(l,t,x_lt)
X <- as.matrix(x_lt[,3])
#X <- as.matrix(cbind(1,x_lt[,3]))
# global is kt level data set
# # g_data <- df |>
# # group_by(k,t) |>
# # summarise(g_kt=sum(g_lkt*z_lkt))
#
# G <- as.matrix(g_data[,3])
G <- as.matrix(g_kt[,3])
# generate wide Z matrix, LTxKT from df
# z_data <- df |>
# select(l,t,k,z_lkt) |>
# pivot_wider(names_from = k, values_from = z_lkt) |>
# select(-l, -t)
# this makes a wide Z data, which is LTxKT, KT variables are generated by the spread function.
# basically expand z_lkt to KT variables, filled with 0 if there is no data.
# I was using z_lkt, but it should be z_lk0?
# z_data <- df |>
# select(l,t,k,z_lkt) |>
# mutate(k = str_glue("t{t}_sh_ind_{k}") ) |>
# spread(k, z_lkt, fill = 0)
z_data <- df |>
select(l,t,k,z_lk0) |>
mutate(k = str_glue("t{t}_sh_ind_{k}") ) |>
spread(k, z_lk0, fill = 0)
Z <- as.matrix(z_data[,-c(1,2)])
beta = round((t(Z) %*% Y)/(t(Z) %*% X), digits=3)
alpha=(diag(as.numeric(G)) %*% t(Z) %*% X) / as.numeric((t(G) %*% t(Z) %*% X))
bw <- tibble(g_kt, alpha=as.numeric(alpha), beta=as.numeric(beta)) |>
mutate(product=alpha*beta)
bw
## # A tibble: 16 × 6
## t k g_kt alpha beta product
## <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 0.130 0.0108 0.721 0.00777
## 2 1 2 0.146 0.0194 1.16 0.0225
## 3 1 3 -0.165 0.00963 0.802 0.00772
## 4 1 4 -0.110 0.00714 1.08 0.00772
## 5 2 1 -0.0289 -0.00801 0.878 -0.00703
## 6 2 2 -0.00130 -0.000392 0.932 -0.000366
## 7 2 3 0.0433 0.0134 0.891 0.0120
## 8 2 4 0.0840 0.0283 0.906 0.0257
## 9 3 1 -0.101 0.137 0.989 0.136
## 10 3 2 -0.152 0.221 0.988 0.219
## 11 3 3 -0.0569 0.0755 0.986 0.0745
## 12 3 4 -0.0994 0.136 0.995 0.135
## 13 4 1 0.0266 0.0270 1.01 0.0273
## 14 4 2 0.139 0.155 0.981 0.152
## 15 4 3 0.0263 0.0268 1.01 0.0270
## 16 4 4 0.127 0.141 1.01 0.143
sum(bw$product)
## [1] 0.9892878
The sum of the product of \(\alpha\) and \(\beta\) should be the same as the Bartik estimator.
Note there is no restriction for \(\alpha\), other than the sum is 1. Therefore it can be negative or positive. It can be greater than 1, as long as the sum is 1. This probably needs to be restricted to be less than 1. But I am not sure about it.
Why is good to have Rotemberg weights?
GPSS Corollary D.1. shows the interpretation of the Rotemberg weights.
The percentage of bias can be writen as:
\[ \frac{E(\tilde{\beta})}{\beta} = \sum_k \alpha_k \frac{E(\tilde \beta_k)}{\beta_0}\]
This is to say the \(\alpha_k\) is the share of the bias of the estimate of \(\beta\) that comes from the bias of the estimate of \(\beta_k\). That is, how sensitive it is to the bias from industry \(k\).
In my simulations, all the industries are random, therefore the weights are random too. But in empirical studies, they are not random, most likely dominated by some industries. This is why it is important to have Rotemberg weights. It tells us which industries are important in the estimation of the effect of interest.
What if you are using a reduced form?
If you use the reduced form, something like:
\[ y_{lt} = D_{lt} \rho + \beta_0 B_{lt} + \epsilon_{lt}\]
where \(B_{lt}\) is the Bartik instrument, constructed in such as a way:
\[ B_{lt} = Z_{l0} G_{t} = \sum_{k=1}^K z_{lk0} g_{kt} \]
Then I think the Rotemberg weights should be calculated as:
\[ \beta_{Bartik} = \sum_{k} \hat \alpha_k \hat \beta_k\] where \[ \hat \beta_k = (Z'_k B^{\perp})^{-1} Z'_k Y^{\perp}\] and \[ \hat \alpha_k = \frac{g_k Z'_k B^{\perp}}{\sum_{k'} g_{k'} Z'_{k'} B^{\perp}}\]
where \(B^{\perp}\) and \(Y^{\perp}\) are residualized \(B\) and \(Y\).
result_reduced <- df_lt |>
lm(formula = y_lt ~ b_lt)
modelsummary::modelsummary(result_reduced)
(1) | |
---|---|
(Intercept) | -0.001 |
(0.002) | |
b_lt | 0.909 |
(0.023) | |
Num.Obs. | 4000 |
R2 | 0.278 |
R2 Adj. | 0.278 |
AIC | -6730.0 |
BIC | -6711.1 |
Log.Lik. | 3367.980 |
F | 1538.779 |
RMSE | 0.10 |
Now we calculate the Rotemberg weights for the reduced form, for our simulated data.
B <- as.matrix(b_lt[,3])
beta = (t(Z) %*% Y)/(t(Z) %*% B)
alpha=(diag(as.numeric(G)) %*% t(Z) %*% B) / as.numeric((t(G) %*% t(Z) %*% B))
bw <- tibble(g_kt, alpha=as.numeric(alpha), beta=as.numeric(beta)) |>
mutate(product = alpha * beta)
bw
## # A tibble: 16 × 6
## t k g_kt alpha beta product
## <int> <int> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 0.130 0.0233 0.307 0.00714
## 2 1 2 0.146 0.0302 0.686 0.0207
## 3 1 3 -0.165 0.0304 0.233 0.00710
## 4 1 4 -0.110 0.0128 0.553 0.00710
## 5 2 1 -0.0289 -0.00660 0.979 -0.00646
## 6 2 2 -0.00130 -0.000359 0.937 -0.000336
## 7 2 3 0.0433 0.0137 0.803 0.0110
## 8 2 4 0.0840 0.0313 0.753 0.0236
## 9 3 1 -0.101 0.127 0.985 0.125
## 10 3 2 -0.152 0.209 0.961 0.201
## 11 3 3 -0.0569 0.0681 1.01 0.0685
## 12 3 4 -0.0994 0.126 0.990 0.124
## 13 4 1 0.0266 0.0243 1.03 0.0250
## 14 4 2 0.139 0.154 0.909 0.140
## 15 4 3 0.0263 0.0241 1.03 0.0248
## 16 4 4 0.127 0.134 0.979 0.131
sum(bw$product)
## [1] 0.9089352