19 min read

Bartik Instrument and Rotemberg weights

Bartik Instrument

Bartik instrument is a method to estimate the causal effect of a treatment on an outcome. The method is developed by Bartik (1991). The method is also known as the shift share IV method.

We follow Goldsmith-Pinkham, Sorkin, and Swift (2021) (GPSS) to describe the method. Let’s say we are interested in estimating the elasiticity of employment on wage growth.

ylt=Dltρ+β0xlt+ϵlt

yl is the wage growth in location l, xl is the employment growth rate. β0 is the inverse elasticity of labor supply.

The usual challenge is that xl is endogenous. There could be factors that are driving both wage growth and employment growth at location l. Note that I am omitting the time subscript for simplicity.

There are two accounting identities:

xlt=ZltGlt=k=1Kzlktglkt and glkt=gkt+g~lkt

Suppose in location l there are K industries. Then the employment growth can be decomposed to the sum of industry growth glkt, with z’s as the industry shares.

The second identity is to say the this industry growh rate at this location can be decomposed to a national industry rate and a location-industry rate. The idea of Bartik instrument is to use the product of initial industry location share and industry growth rate as the instrument.

Blt=Zl0Gt=k=1Kzlk0gkt

The intuition is that the initial shares are exogenous, as it’s a level variable. And the industry growth rate is exogenous, since it’s national, not location related.

Bartik Instrument in special cases

Two industries and one period

GPSS goes through the simpliest case: two industries and one period.

Bl=zl1g1+zl2g2

Since there are only two industries, zl1=1zl2. Then we can rewrite the Bartik instrument as

Bl=g2+(g1g2)zl1

From this simplist case, we see that the instrument has only zl1, that is related to location. If the industry share can be considered exogenous, then the instrument is valid. We have to argue that the industry share is exogenous. In this case, the instrument is a linear function of the industry share. Therefore, the Bartik instrument is the same as using the industry share as an instrument.

GPSS shows that just-identified two-stage least squares using a Bartik instrument can in fact be written as GMM with an overidentified set of instruments, where the weight matrix is the outer product of national growth rates. That is, Bartik estimator is equivalent to a GMM estimator, with each location share as one instrument, and the weight matrix is the outer product of national growth rates.

With K industries and T time periods, it’s basically using K×T instruments. Still, Bartik instrument is the same as GMM with all these instruments.

Rotemberg weights

GPSS decompose the Bartik estimator:

βBartik=kα^kβ^k where β^k=(ZkX)1ZkY and α^k=gkZkXkgkZkX

Here X and Y are the residualized X and Y, from the regression of X and Y on all other variables (controls).

Rotemberg weights provide a measure of how important each share is. Each weight states how much bias the overall estimate would have if a given instrument (industry) were biased by αk percent. Weights can be very useful to see which industries are important, and thus whether exogeneity assumptions are plausible for instruments with largest weights.

Simulations

Simulations codes are adopted from Kohei Kawaguchi’s lecture notes. I modified slightly to fit different scenarios.

Here we simulate a case with 4 industries and 4 periods. We have 1000 locations, 4 periods, and 4 industries. The true β (effect of x on y) is 1.

library(ggplot2)
library(tidyverse)
library(modelsummary)
library(kableExtra)
library(estimatr)
set.seed(6)
L <- 1000
T <- 4
K <- 4
beta <- 1
sigma <-c(1, 1, 1)

z_lk0 <- expand_grid(
    l = 1:L,
    k = 1:K
  ) |> 
mutate(z_lk0 = runif(length(l))) 

g_kt <-  expand_grid(
    t = 1:T,
    k = 1:K
  ) |> 
  mutate(g_kt = rnorm(t, 0, .1))

df <- expand_grid(
    l = 1:L,
    t = 1:T,
    k = 1:K
  ) |> 
  left_join(z_lk0, by = c("l", "k")) |> 
  left_join(g_kt,  by = c("k", "t")) |>
  mutate(z_lkt = z_lk0 + sigma[1] * runif(length(l)),
    g_lkt = g_kt + sigma[2] * rnorm(length(l),0, .1)
  ) 

df <- df %>%
  group_by(t, l) |> 
  mutate(
    z_lk0 = z_lk0 / sum(z_lk0),
    z_lkt = (z_lkt / sum(z_lkt)),
    x_lt = sum(z_lkt * g_lkt)
  ) 

df
## # A tibble: 16,000 × 8
## # Groups:   t, l [4,000]
##        l     t     k z_lk0     g_kt z_lkt    g_lkt    x_lt
##    <int> <int> <int> <dbl>    <dbl> <dbl>    <dbl>   <dbl>
##  1     1     1     1 0.277  0.130   0.252  0.115    0.0400
##  2     1     1     2 0.428  0.146   0.302  0.291    0.0400
##  3     1     1     3 0.121 -0.165   0.321 -0.231    0.0400
##  4     1     1     4 0.174 -0.110   0.126 -0.0185   0.0400
##  5     1     2     1 0.277 -0.0289  0.160 -0.00619  0.0402
##  6     1     2     2 0.428 -0.00130 0.356  0.0560   0.0402
##  7     1     2     3 0.121  0.0433  0.272 -0.0590   0.0402
##  8     1     2     4 0.174  0.0840  0.212  0.176    0.0402
##  9     1     3     1 0.277 -0.101   0.237 -0.0361  -0.0738
## 10     1     3     2 0.428 -0.152   0.367 -0.126   -0.0738
## # ℹ 15,990 more rows

This assumes we observe the industry shares at time 0 (zlk0), and the industry growth rates (gkt). We also observe the industry shares at time t (zlkt). The employment growth rate at location l and time t (glkt) is a function of gkt plus some noise. We can calculate the xlt, which is the weighted sum of industry growth rates.

Next we generate the outcome at l, t level. Outcome (wage growth) is a linear function of x (employment growth) at l, t level.

df_lt <- df |>
  mutate(g_tilde_lkt = g_lkt - g_kt) |> 
  group_by(l, t) |> 
  summarise(across(c(x_lt, g_tilde_lkt),  mean),  .groups = "drop") |>
  ungroup() |> 
  mutate(y_lt = beta * x_lt + sigma[3] * g_tilde_lkt)

If we run OLS on raw data, we’ll get biased results.

result_ols <-
  df_lt |>
  lm(formula = y_lt ~ x_lt)

modelsummary(result_ols)
(1)
(Intercept) 0.000
(0.001)
x_lt 1.336
(0.008)
Num.Obs. 4000
R2 0.884
R2 Adj. 0.884
AIC -14059.9
BIC -14041.0
Log.Lik. 7032.945
F 30603.441
RMSE 0.04

Now we generate the Bartik instrument. We calculate the Bartik instrument as the weighted sum of zlk0 and gkt.

b_lt <- df |> 
  group_by(l, t) |>
  summarise(b_lt = sum(z_lk0 * g_kt)) |>
  ungroup()

df_lt <- df_lt |>
  left_join(b_lt, by = c("l", "t"))


result_first_stage <-  lm(formula = x_lt ~ b_lt, data = df_lt)

modelsummary(result_first_stage)
(1)
(Intercept) -0.001
(0.001)
b_lt 0.919
(0.013)
Num.Obs. 4000
R2 0.573
R2 Adj. 0.573
AIC -11638.5
BIC -11619.6
Log.Lik. 5822.262
F 5361.487
RMSE 0.06
result_tsls <- df_lt |>
  estimatr::iv_robust(formula = y_lt ~ x_lt | b_lt)

modelsummary::modelsummary(result_tsls)
(1)
(Intercept) 0.000
(0.001)
x_lt 0.990
(0.012)
Num.Obs. 4000
R2 0.825
R2 Adj. 0.825
AIC -12398.4
BIC -12379.5
RMSE 0.05

Calculate Rotemberg weights

Let’s do IV using Bartik instruments. No controls here. We use matrix operation to recover the IV estimates, which is the same as the tsls function in R.

B = as.matrix(cbind(1,df_lt$b_lt))
Y = as.matrix(df_lt$y_lt)
X = as.matrix(cbind(1,df_lt$x_lt))
 
iv = round(solve(t(B)%*%X)%*%t(B)%*%Y,3)
 
## Label and organize results into a data frame
beta.hat = as.data.frame(cbind(c("Intercept","x"),iv))
names(beta.hat) = c("Coeff.","Est")
beta.hat
##      Coeff.  Est
## 1 Intercept    0
## 2         x 0.99

Now we calculate the Rotemberg weights.

# This code is the R version of Paul Goldsmith-Pinkham's R code for Rotemberg weights on 
# github; which is actually a call from a C++ code by jjchern.
# bw.R calls ComputeAlphaBeta.cpp
# dimensions:  G is KTx1, Z is LTxKT, Y is LTx1, X is LTx1
# so B=ZG is LTx1
# Here I learned the dimensions from the ADH example in Paul's github.
# in ADH example, czone(L) is 722, year(T) is 2, ind(K) is 390.
# so G is 780x1, Z is 1444x780, Y is 1444x1, X is 1444x1
y_lt <- df_lt |> 
  select(l,t,y_lt)

Y <- as.matrix(y_lt[,3])

x_lt <- df_lt |> 
  select(l,t,x_lt)

X <- as.matrix(x_lt[,3])
#X <- as.matrix(cbind(1,x_lt[,3]))

# global is kt level data set
# # g_data <- df |> 
# #   group_by(k,t) |> 
# #   summarise(g_kt=sum(g_lkt*z_lkt)) 
# 
# G <- as.matrix(g_data[,3])

G <- as.matrix(g_kt[,3])

# generate wide Z matrix, LTxKT from df
# z_data <- df |> 
#   select(l,t,k,z_lkt) |> 
#   pivot_wider(names_from = k, values_from = z_lkt) |> 
#   select(-l, -t)

# this makes a wide Z data, which is LTxKT, KT variables are generated by the spread function.
# basically expand z_lkt to KT variables, filled with 0 if there is no data.
# I was using z_lkt, but it should be z_lk0?
# z_data <- df |> 
#   select(l,t,k,z_lkt) |> 
#   mutate(k = str_glue("t{t}_sh_ind_{k}") ) |>
#   spread(k, z_lkt, fill = 0)

z_data <- df |> 
  select(l,t,k,z_lk0) |> 
  mutate(k = str_glue("t{t}_sh_ind_{k}") ) |>
  spread(k, z_lk0, fill = 0)

Z <- as.matrix(z_data[,-c(1,2)])

beta = round((t(Z) %*% Y)/(t(Z) %*% X), digits=3)
alpha=(diag(as.numeric(G)) %*% t(Z)  %*% X) / as.numeric((t(G) %*% t(Z) %*% X))

bw <- tibble(g_kt, alpha=as.numeric(alpha), beta=as.numeric(beta)) |> 
  mutate(product=alpha*beta)

bw
## # A tibble: 16 × 6
##        t     k     g_kt     alpha  beta   product
##    <int> <int>    <dbl>     <dbl> <dbl>     <dbl>
##  1     1     1  0.130    0.0108   0.721  0.00777 
##  2     1     2  0.146    0.0194   1.16   0.0225  
##  3     1     3 -0.165    0.00963  0.802  0.00772 
##  4     1     4 -0.110    0.00714  1.08   0.00772 
##  5     2     1 -0.0289  -0.00801  0.878 -0.00703 
##  6     2     2 -0.00130 -0.000392 0.932 -0.000366
##  7     2     3  0.0433   0.0134   0.891  0.0120  
##  8     2     4  0.0840   0.0283   0.906  0.0257  
##  9     3     1 -0.101    0.137    0.989  0.136   
## 10     3     2 -0.152    0.221    0.988  0.219   
## 11     3     3 -0.0569   0.0755   0.986  0.0745  
## 12     3     4 -0.0994   0.136    0.995  0.135   
## 13     4     1  0.0266   0.0270   1.01   0.0273  
## 14     4     2  0.139    0.155    0.981  0.152   
## 15     4     3  0.0263   0.0268   1.01   0.0270  
## 16     4     4  0.127    0.141    1.01   0.143
sum(bw$product)
## [1] 0.9892878

The sum of the product of α and β should be the same as the Bartik estimator.

Note there is no restriction for α, other than the sum is 1. Therefore it can be negative or positive. It can be greater than 1, as long as the sum is 1. This probably needs to be restricted to be less than 1. But I am not sure about it.

Why is good to have Rotemberg weights?

GPSS Corollary D.1. shows the interpretation of the Rotemberg weights.

The percentage of bias can be writen as:

E(β~)β=kαkE(β~k)β0

This is to say the αk is the share of the bias of the estimate of β that comes from the bias of the estimate of βk. That is, how sensitive it is to the bias from industry k.

In my simulations, all the industries are random, therefore the weights are random too. But in empirical studies, they are not random, most likely dominated by some industries. This is why it is important to have Rotemberg weights. It tells us which industries are important in the estimation of the effect of interest.

What if you are using a reduced form?

If you use the reduced form, something like:

ylt=Dltρ+β0Blt+ϵlt

where Blt is the Bartik instrument, constructed in such as a way:

Blt=Zl0Gt=k=1Kzlk0gkt

Then I think the Rotemberg weights should be calculated as:

βBartik=kα^kβ^k where β^k=(ZkB)1ZkY and α^k=gkZkBkgkZkB

where B and Y are residualized B and Y.

result_reduced <- df_lt |>
  lm(formula = y_lt ~  b_lt)

modelsummary::modelsummary(result_reduced)
(1)
(Intercept) -0.001
(0.002)
b_lt 0.909
(0.023)
Num.Obs. 4000
R2 0.278
R2 Adj. 0.278
AIC -6730.0
BIC -6711.1
Log.Lik. 3367.980
F 1538.779
RMSE 0.10

Now we calculate the Rotemberg weights for the reduced form, for our simulated data.

B <- as.matrix(b_lt[,3])

beta = (t(Z) %*% Y)/(t(Z) %*% B)
alpha=(diag(as.numeric(G)) %*% t(Z)  %*% B) / as.numeric((t(G) %*% t(Z) %*% B))

bw <- tibble(g_kt, alpha=as.numeric(alpha), beta=as.numeric(beta)) |> 
  mutate(product = alpha * beta)

bw
## # A tibble: 16 × 6
##        t     k     g_kt     alpha  beta   product
##    <int> <int>    <dbl>     <dbl> <dbl>     <dbl>
##  1     1     1  0.130    0.0233   0.307  0.00714 
##  2     1     2  0.146    0.0302   0.686  0.0207  
##  3     1     3 -0.165    0.0304   0.233  0.00710 
##  4     1     4 -0.110    0.0128   0.553  0.00710 
##  5     2     1 -0.0289  -0.00660  0.979 -0.00646 
##  6     2     2 -0.00130 -0.000359 0.937 -0.000336
##  7     2     3  0.0433   0.0137   0.803  0.0110  
##  8     2     4  0.0840   0.0313   0.753  0.0236  
##  9     3     1 -0.101    0.127    0.985  0.125   
## 10     3     2 -0.152    0.209    0.961  0.201   
## 11     3     3 -0.0569   0.0681   1.01   0.0685  
## 12     3     4 -0.0994   0.126    0.990  0.124   
## 13     4     1  0.0266   0.0243   1.03   0.0250  
## 14     4     2  0.139    0.154    0.909  0.140   
## 15     4     3  0.0263   0.0241   1.03   0.0248  
## 16     4     4  0.127    0.134    0.979  0.131
sum(bw$product)
## [1] 0.9089352