5 Chapter 7: Designs with Variation in Treatment Dose
5.1 Overview
Dataset: Pierce & Schott (2016) — pierce_schott_didtextbook.dta (103 industries, 22 variables)
This chapter applies the Heterogeneous Adoption Design (HAD) framework to study how China’s permanent normal trade relations (PNTR) status affected US manufacturing employment. The key feature: treatment is continuous (the NTR gap, ranging from 0.02 to 0.64), not binary. All 103 US industries are “treated” — there are no pure control groups, but industries with near-zero NTR gaps serve as quasi-stayers.
Packages used:
| Task | Stata | R | Python |
|---|---|---|---|
| OLS + HC2 SE | reg, vce(hc2) |
sandwich + lmtest |
statsmodels (HC2) |
| Weight decomposition | twowayfeweights (SSC) |
TwoWayFEWeights (CRAN) |
Not available for fdTR |
| Stute linearity test | stute_test (GitHub) |
StuteTest (GitHub) |
Manual implementation |
| Treatment visualization | panelview (SSC) |
panelView (CRAN) |
matplotlib (manual) |
| F-test | test (built-in) |
car::linearHypothesis |
scipy + f_test |
| HAD event-study | did_had (SSC) + nprobust (GitHub) + gtools (SSC) |
DIDHAD (GitHub) + nprobust (CRAN) |
did-had (PyPI) |
5.2 PanelView
Note: In the Pierce & Schott (2016) application, treatment is continuous (NTR gap), so the standard binary PanelView is not applicable. See the code below for reference, but plots are omitted.
* ssc install panelview, replace
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
preserve
reshape long delta deltalintrend, i(indusid) j(year)
panelview delta ntrgap, i(indusid) t(year) type(treat) title("Treatment Dose: Pierce & Schott (2016)") legend(off)
graph export "$FIGDIR\ch07_panelview_stata.png", replace width(1200)
restorelibrary(panelView); library(tidyr)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData"))
df_long <- df %>% pivot_longer(cols = starts_with("delta"), names_to = "variable", values_to = "value")
df_long$year <- as.numeric(gsub("delta", "", df_long$variable))
df_long <- df_long[!is.na(df_long$year), ]
df_long$treated <- ifelse(df_long$year >= 2001, 1, 0)
png("figures/ch07_panelview_R.png", width = 1000, height = 600)
panelview(value ~ treated, data = df_long, index = c("indusid", "year"), type = "treat",
main = "Pierce & Schott (2016)", ylab = "")
dev.off()import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
cmap = plt.cm.Blues
norm = mcolors.Normalize(vmin=0, vmax=df['ntrgap'].max() * 1.1)
# Heatmap: each row = industry (sorted by ntrgap), each column = year
# Color intensity = NTR gap (continuous treatment dose)5.3 CQ#36: TWFE Regressions
Regress \(\Delta_{2000}^t Y_g\) on \(D_g\) (NTR gap) for \(t \in \{2001, 2002, 2004, 2005\}\), with HC2 standard errors.
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
reg delta2001 ntrgap, vce(hc2)
reg delta2002 ntrgap, vce(hc2)
reg delta2004 ntrgap, vce(hc2)
reg delta2005 ntrgap, vce(hc2)--- 2001 --- N = 103, R² = 0.0237
| Robust HC2
delta2001 | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
ntrgap | -.0612112 .0401833 -1.52 0.131 -.1409241 .0185017
_cons | -.0141423 .0129623 -1.09 0.278 -.039856 .0115714
--- 2002 --- N = 103, R² = 0.1263
ntrgap | -.2598747 .0754512 -3.44 0.001 -.4095496 -.1101999
_cons | -.0550989 .0220561 -2.50 0.014 -.0988522 -.0113456
--- 2004 --- N = 103, R² = 0.0857
ntrgap | -.5397825 .1527447 -3.53 0.001 -.8427868 -.2367782
_cons | -.0868008 .0524821 -1.65 0.101 -.1909111 .0173095
--- 2005 --- N = 103, R² = 0.0707
ntrgap | -.5317591 .1666765 -3.19 0.002 -.8624004 -.2011179
_cons | -.1056161 .0536739 -1.97 0.052 -.2120906 .0008584
library(haven); library(sandwich); library(lmtest)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(2001, 2002, 2004, 2005)) {
fit <- lm(as.formula(paste0("delta", yr, " ~ ntrgap")), data = df)
print(coeftest(fit, vcov = vcovHC(fit, type = "HC2")))
}--- 2001 --- N = 103, R² = 0.02366
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0141423 0.0129623 -1.0910 0.2779
ntrgap -0.0612112 0.0401833 -1.5233 0.1308
--- 2002 --- N = 103, R² = 0.12631
(Intercept) -0.0550989 0.0220561 -2.4981 0.0141
ntrgap -0.2598747 0.0754512 -3.4443 0.0008
--- 2004 --- N = 103, R² = 0.08567
(Intercept) -0.0868008 0.0524821 -1.6539 0.1012
ntrgap -0.5397825 0.1527447 -3.5339 0.0006
--- 2005 --- N = 103, R² = 0.07072
(Intercept) -0.1056161 0.0536739 -1.9677 0.0518
ntrgap -0.5317591 0.1666765 -3.1904 0.0019
import pandas as pd
import statsmodels.api as sm
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [2001, 2002, 2004, 2005]:
y = df[f'delta{year}']
X = sm.add_constant(df['ntrgap'])
res = sm.OLS(y, X).fit(cov_type='HC2')--- 2001 --- N = 103, R² = 0.02366
Variable Coef Std.Err t P>|t|
const -0.0141423 0.0129623 -1.091 0.2753
ntrgap -0.0612112 0.0401833 -1.523 0.1277
--- 2002 --- N = 103, R² = 0.12631
const -0.0550989 0.0220561 -2.498 0.0125
ntrgap -0.2598747 0.0754512 -3.444 0.0006
--- 2004 --- N = 103, R² = 0.08567
const -0.0868008 0.0524821 -1.654 0.0981
ntrgap -0.5397825 0.1527447 -3.534 0.0004
--- 2005 --- N = 103, R² = 0.07072
const -0.1056161 0.0536739 -1.968 0.0491
ntrgap -0.5317591 0.1666765 -3.190 0.0014
Interpretation: The NTR gap has a significant negative effect on manufacturing employment growth starting from 2002. The effect grows over time: a 10 pp increase in the NTR gap is associated with roughly 5 pp lower employment growth by 2004–2005.
5.4 CQ#37: Decompose TWFE Weights
Use
twowayfeweightswithtype(fdTR)to assess whether \(\hat{\beta}_{fe}\) estimates a convex combination of effects.
* ssc install twowayfeweights, replace
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
twowayfeweights delta2001 indusid cons ntrgap ntrgap, type(fdTR)Under the common trends assumption,
the TWFE coefficient beta, equal to -0.0612, estimates a weighted sum of 103 ATTs.
62 ATTs receive a positive weight, and 41 receive a negative weight.
------------------------------------------------
Treat. var: ntrgap # ATTs Σ weights
------------------------------------------------
Positive weights 62 1.3190
Negative weights 41 -0.3190
------------------------------------------------
Total 103 1.0000
------------------------------------------------
library(haven); library(TwoWayFEWeights)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
# Workaround: fdTR with constant time variable crashes in R.
# Create a 2-period panel and use feTR (equivalent with T=2).
df_panel <- rbind(
data.frame(indusid = df$indusid, time = 1, Y = 0, D = 0),
data.frame(indusid = df$indusid, time = 2, Y = df$delta2001, D = df$ntrgap)
)
gq2 <- twowayfeweights(df_panel, "Y", "indusid", "time", "D",
type = "feTR", summary_measures = TRUE)Under the common trends assumption,
the TWFE coefficient beta, equal to -0.0612, estimates a weighted sum of 103 ATTs.
62 ATTs receive a positive weight, and 41 receive a negative weight.
Treat. var: D ATTs Σ weights
Positive weights 62 1.319
Negative weights 41 -0.319
Total 103 1
Summary Measures:
min σ(Δ) compatible with β_fe and Δ_TR = 0: 0.0317
min σ(Δ) compatible with opposite sign: 0.0642
import pandas as pd
from twowayfeweights import twowayfeweights
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
df["cons"] = 1
df["ntrgap_D0"] = df["ntrgap"]
result = twowayfeweights(df, "delta2001", "indusid", "cons", "ntrgap",
type="fdTR", D0="ntrgap_D0")Under the common trends assumption,
beta estimates a weighted sum of 103 ATTs.
62 ATTs receive a positive weight, and 41 receive a negative weight.
------------------------------------------------
Treat. var: ntrgap # ATTs Σ weights
------------------------------------------------
Positive weights 62 1.3190
Negative weights 41 -0.3190
------------------------------------------------
Total 103 1.0000
------------------------------------------------
Interpretation: 41 out of 103 industry ATTs receive negative weights, summing to −0.3190. The TWFE estimator does not estimate a convex combination of effects. However, the textbook notes this is expected in the HAD framework and motivates the Stute linearity tests below.
5.5 CQ#38: Test Random Assignment of Treatment Dose
Regress the NTR gap on pre-treatment employment levels (1997–2000) and test their joint significance.
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
reg ntrgap lemp1997 lemp1998 lemp1999 lemp2000, vce(hc2)
test lemp1997 lemp1998 lemp1999 lemp2000 | Robust HC2
ntrgap | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
lemp1997 | -.0219594 .1325814 -0.17 0.869 -.2850629 .2411441
lemp1998 | .6043265 .3103595 1.95 0.054 -.0115719 1.220225
lemp1999 | -.2260641 .4573776 -0.49 0.622 -1.133715 .6815867
lemp2000 | -.3418326 .2938878 -1.16 0.248 -.9250434 .2413782
_cons | .1204804 .1396417 0.86 0.390 -.156634 .3975949
F(4, 98) = 2.39 Prob > F = 0.0560
library(haven); library(car)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
m3 <- lm(ntrgap ~ lemp1997 + lemp1998 + lemp1999 + lemp2000, data = df)
coeftest(m3, vcov = vcovHC(m3, type = "HC2"))
linearHypothesis(m3, c("lemp1997=0","lemp1998=0","lemp1999=0","lemp2000=0"),
vcov. = vcovHC(m3, type = "HC2")) Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.1204804 0.1396417 0.8628 0.3904
lemp1997 -0.0219594 0.1325814 -0.1656 0.8688
lemp1998 0.6043265 0.3103595 1.9472 0.0544
lemp1999 -0.2260641 0.4573776 -0.4943 0.6222
lemp2000 -0.3418326 0.2938878 -1.1631 0.2476
F(4, 98) = 2.3901 Pr(>F) = 0.05597
import pandas as pd
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
X_vars = ['lemp1997', 'lemp1998', 'lemp1999', 'lemp2000']
y = df['ntrgap']; X = sm.add_constant(df[X_vars])
res = sm.OLS(y, X).fit(cov_type='HC2')
# F-test for joint significance Variable Coef Std.Err t P>|t|
const 0.1204804 0.1396417 0.863 0.3883
lemp1997 -0.0219594 0.1325814 -0.166 0.8684
lemp1998 0.6043265 0.3103595 1.947 0.0515
lemp1999 -0.2260641 0.4573776 -0.494 0.6211
lemp2000 -0.3418326 0.2938878 -1.163 0.2448
F(4, 98) = 2.3901 Prob > F = 0.0560
Interpretation: The joint F-test yields \(p = 0.056\), marginally insignificant at 5%. Pre-treatment employment levels are not strong predictors of the NTR gap, consistent with (approximate) random assignment of the treatment dose.
5.6 Stute Tests of Linearity (extra)
Test whether \(E[\Delta Y | D]\) is linear in \(D\) using the Stute (1997) Cramér–von Mises test, for each post-treatment year and jointly.
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
stute_test delta2001 ntrgap, seed(1)
stute_test delta2002 ntrgap, seed(1)
stute_test delta2004 ntrgap, seed(1)
stute_test delta2005 ntrgap, seed(1)
preserve
reshape long delta deltalintrend, i(indusid) j(year)
stute_test delta ntrgap indusid year if year>=2001, seed(1)
restoreCross-sectional:
Variable | t stat p-value
delta2001 | .0004426 .042
delta2002 | .0011319 .134
delta2004 | .0039215 .478
delta2005 | .0033582 .716
Panel Joint Test:
| t stat p-value
2001 | .0004426 .042
2002 | .0011319 .134
2004 | .0039215 .478
2005 | .0033582 .716
Joint Stute test: 0.0089 (0.5400)
library(haven); library(StuteTest)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(2001, 2002, 2004, 2005))
stute_test(df = df, Y = paste0("delta", yr), D = "ntrgap", seed = 1)
# Panel joint test
df_panel_gq4 <- df_long %>% filter(year >= 2001)
stute_test(df = df_panel_gq4, Y = "delta", D = "ntrgap",
group = "indusid", time = "year", seed = 1)Cross-sectional:
Variable | t stat p-value
delta2001 | 0.0004426 0.054
delta2002 | 0.0011319 0.152
delta2004 | 0.0039215 0.480
delta2005 | 0.0033582 0.684
Panel Joint Test:
| t stat p-value
2001 | 0.0004426 0.064
2002 | 0.0011319 0.130
2004 | 0.0039215 0.484
2005 | 0.0033582 0.660
Joint Stute test: 0.0089 (0.5500)
import pandas as pd
import numpy as np
def stute_test_cvm(Y, D, df, order=1, seed=1, nboot=999):
y = df[Y].values.astype(float)
d = df[D].values.astype(float)
n = len(y)
X = np.column_stack([d**k for k in range(order+1)])
beta = np.linalg.lstsq(X, y, rcond=None)[0]
resid = y - X @ beta
idx = np.argsort(d)
resid_sorted = resid[idx]
cum_resid = np.cumsum(resid_sorted) / np.sqrt(n)
cvm_stat = np.sum(cum_resid**2) / n
rng = np.random.RandomState(seed)
count = 0
for _ in range(nboot):
wild = resid * rng.choice([-1, 1], size=n)
wild_sorted = wild[idx]
cum_wild = np.cumsum(wild_sorted) / np.sqrt(n)
boot_stat = np.sum(cum_wild**2) / n
if boot_stat >= cvm_stat:
count += 1
pval = (count + 1) / (nboot + 1)
return cvm_stat, pval
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [2001, 2002, 2004, 2005]:
stat, pv = stute_test_cvm(f'delta{year}', 'ntrgap', df, order=1, seed=1)
print(f"delta{year}: CvM = {stat:.7f}, p = {pv:.3f}")Cross-sectional:
Variable | CvM stat p-value
delta2001 | 0.0004426 0.695
delta2002 | 0.0011319 0.740
delta2004 | 0.0039215 0.925
delta2005 | 0.0033582 0.972
Panel Joint Test:
2001 | 0.0004426 0.695
2002 | 0.0011319 0.740
2004 | 0.0039215 0.925
2005 | 0.0033582 0.972
Joint | 0.0088542
Interpretation: Linearity is not rejected at conventional levels for 2002–2005 (all \(p > 0.10\)). For 2001, the p-value is borderline (\(\approx 0.04\)–\(0.08\) depending on bootstrap draw). The joint test (\(p \approx 0.54\)) does not reject linearity.
5.7 Pre-Trend Tests — Without Linear Trends (extra)
Test for pre-trends by regressing \(\Delta_{2000}^t Y_g\) on \(D_g\) for \(t \in \{1999, 1998, 1997\}\).
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
reg delta1999 ntrgap, vce(hc2)
reg delta1998 ntrgap, vce(hc2)
reg delta1997 ntrgap, vce(hc2)--- delta1999 --- N = 103, R² = 0.0325
ntrgap | .0580708 .0279591 2.08 0.040 .0026074 .1135341
_cons | -.0051607 .0086782 -0.59 0.553 -.0223759 .0120544
--- delta1998 --- N = 103, R² = 0.0616
ntrgap | .1407052 .0547983 2.57 0.012 .0320002 .2494102
_cons | -.0107783 .0162652 -0.66 0.509 -.0430442 .0214875
--- delta1997 --- N = 103, R² = 0.0307
ntrgap | .1625066 .0712845 2.28 0.025 .0210973 .3039160
_cons | -.0353406 .0211379 -1.67 0.098 -.0772726 .0065913
library(haven)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(1999, 1998, 1997)) {
fit <- lm(as.formula(paste0("delta", yr, " ~ ntrgap")), data = df)
print(coeftest(fit, vcov = vcovHC(fit, type = "HC2")))
}--- delta1999 --- N = 103, R² = 0.03251
(Intercept) -0.0051607 0.0086782 -0.5947 0.5534
ntrgap 0.0580708 0.0279591 2.0770 0.0403
--- delta1998 --- N = 103, R² = 0.06156
(Intercept) -0.0107783 0.0162652 -0.6627 0.5091
ntrgap 0.1407052 0.0547983 2.5677 0.0117
--- delta1997 --- N = 103, R² = 0.03068
(Intercept) -0.0353406 0.0211379 -1.6719 0.0976
ntrgap 0.1625066 0.0712845 2.2797 0.0247
import pandas as pd
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [1999, 1998, 1997]:
y = df[f'delta{year}']
X = sm.add_constant(df['ntrgap'])
res = sm.OLS(y, X).fit(cov_type='HC2')--- delta1999 --- N = 103, R² = 0.03251
const -0.0051607 0.0086782 -0.595 0.5521
ntrgap 0.0580708 0.0279591 2.077 0.0378
--- delta1998 --- N = 103, R² = 0.06156
const -0.0107783 0.0162652 -0.663 0.5075
ntrgap 0.1407052 0.0547983 2.568 0.0102
--- delta1997 --- N = 103, R² = 0.03068
const -0.0353406 0.0211379 -1.672 0.0945
ntrgap 0.1625066 0.0712845 2.280 0.0226
Interpretation: All three pre-treatment coefficients are significant and positive, indicating pre-trends: industries with higher NTR gaps had faster employment growth before PNTR. This motivates adding industry-specific linear trends (GQ6–GQ9).
5.8 Pre-Trend Tests — With Linear Trends (extra)
Repeat GQ5 using the detrended outcome \(\tilde{\Delta}_{2000}^t Y_g\) (netting out industry-specific linear trends).
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
reg deltalintrend1998 ntrgap, vce(hc2)
reg deltalintrend1997 ntrgap, vce(hc2)--- deltalintrend1998 --- N = 103, R² = 0.0046
ntrgap | -.0245637 .0339253 -0.72 0.471 -.0918623 .0427349
_cons | .0004569 .0119279 0.04 0.970 -.0232047 .0241186
--- deltalintrend1997 --- N = 103, R² = 0.0050
ntrgap | -.0463651 .0444009 -1.04 0.299 -.1344446 .0417144
_cons | .0250192 .0153636 1.63 0.107 -.0054579 .0554964
library(haven)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(1998, 1997)) {
fit <- lm(as.formula(paste0("deltalintrend", yr, " ~ ntrgap")), data = df)
print(coeftest(fit, vcov = vcovHC(fit, type = "HC2")))
}--- deltalintrend1998 --- N = 103, R² = 0.00455
(Intercept) 0.0004569 0.0119279 0.0383 0.9695
ntrgap -0.0245637 0.0339253 -0.7241 0.4707
--- deltalintrend1997 --- N = 103, R² = 0.00503
(Intercept) 0.0250192 0.0153636 1.6285 0.1065
ntrgap -0.0463651 0.0444009 -1.0442 0.2989
import pandas as pd
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [1998, 1997]:
y = df[f'deltalintrend{year}']
X = sm.add_constant(df['ntrgap'])
res = sm.OLS(y, X).fit(cov_type='HC2')--- deltalintrend1998 --- N = 103, R² = 0.00455
const 0.0004569 0.0119279 0.038 0.9694
ntrgap -0.0245637 0.0339253 -0.724 0.4690
--- deltalintrend1997 --- N = 103, R² = 0.00503
const 0.0250192 0.0153636 1.628 0.1034
ntrgap -0.0463651 0.0444009 -1.044 0.2964
Interpretation: After including industry-specific linear trends, all pre-trend coefficients become small and insignificant (\(p > 0.29\)). The trends successfully absorb the pre-existing differential growth.
5.9 Stute Tests of Mean Independence (extra)
Test whether \(E[\tilde{\Delta} Y | D] = \text{const}\) (mean independence, order 0) for the detrended pre-period outcomes.
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
stute_test deltalintrend1998 ntrgap, order(0) seed(1)
stute_test deltalintrend1997 ntrgap, order(0) seed(1)
preserve
reshape long delta deltalintrend, i(indusid) j(year)
stute_test deltalintrend ntrgap indusid year if inlist(year, 1997, 1998), order(0) seed(1)
restoreCross-sectional:
Variable | t stat p-value
deltalintrend1998 | .0004249 .302
deltalintrend1997 | .0007520 .508
Panel Joint Test:
1997 | .0007520 .508
1998 | .0004249 .302
Joint Stute test: 0.0012 (0.4720)
library(haven)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
stute_test(df = df, Y = "deltalintrend1998", D = "ntrgap", order = 0, seed = 1)
stute_test(df = df, Y = "deltalintrend1997", D = "ntrgap", order = 0, seed = 1)
# Panel joint test
stute_test(df = df_pre_lt, Y = "deltalintrend", D = "ntrgap",
group = "indusid", time = "year", order = 0, seed = 1)Cross-sectional:
Variable | t stat p-value
deltalintrend1998 | 0.0004249 0.284
deltalintrend1997 | 0.0007520 0.506
Panel Joint Test:
1997 | 0.0007520 0.514
1998 | 0.0004249 0.294
Joint Stute test: 0.0012 (0.4660)
import pandas as pd
import numpy as np
def stute_test_cvm(Y, D, df, order=1, seed=1, nboot=999):
y = df[Y].values.astype(float)
d = df[D].values.astype(float)
n = len(y)
X = np.column_stack([d**k for k in range(order+1)])
beta = np.linalg.lstsq(X, y, rcond=None)[0]
resid = y - X @ beta
idx = np.argsort(d)
cum_resid = np.cumsum(resid[idx]) / np.sqrt(n)
cvm_stat = np.sum(cum_resid**2) / n
rng = np.random.RandomState(seed)
count = 0
for _ in range(nboot):
wild = resid * rng.choice([-1, 1], size=n)
cum_wild = np.cumsum(wild[idx]) / np.sqrt(n)
boot_stat = np.sum(cum_wild**2) / n
if boot_stat >= cvm_stat:
count += 1
return cvm_stat, (count + 1) / (nboot + 1)
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [1998, 1997]:
stat, pv = stute_test_cvm(f'deltalintrend{year}', 'ntrgap', df, order=0, seed=1)
print(f"deltalintrend{year}: CvM = {stat:.7f}, p = {pv:.3f}")Cross-sectional:
Variable | CvM stat p-value
deltalintrend1998 | 0.0004249 0.334
deltalintrend1997 | 0.0007520 0.566
Panel Joint Test:
1997 | 0.0007520 0.566
1998 | 0.0004249 0.334
Joint | 0.0011769 0.530
Interpretation: Mean independence is not rejected for either year (\(p > 0.28\)) or jointly (\(p \approx 0.47\)). The detrended pre-period outcomes are uncorrelated with the treatment dose, supporting the identifying assumptions.
5.10 TWFE With Industry-Specific Linear Trends (extra)
Regress the detrended outcome \(\tilde{\Delta}_{2000}^t Y_g\) on \(D_g\) for post-treatment years.
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
reg deltalintrend2001 ntrgap, vce(hc2)
reg deltalintrend2002 ntrgap, vce(hc2)
reg deltalintrend2004 ntrgap, vce(hc2)
reg deltalintrend2005 ntrgap, vce(hc2)--- deltalintrend2001 --- N = 103, R² = 0.0000
ntrgap | -.0031404 .0390586 -0.08 0.936 -.0806222 .0743413
_cons | -.0193030 .0139040 -1.39 0.168 -.0468848 .0082788
--- deltalintrend2002 --- N = 103, R² = 0.0335
ntrgap | -.1437332 .0754461 -1.91 0.060 -.2933979 .0059315
_cons | -.0654203 .0242514 -2.70 0.008 -.1135286 -.0173120
--- deltalintrend2004 --- N = 103, R² = 0.0262
ntrgap | -.3074995 .1518755 -2.02 0.046 -.6087796 -.0062194
_cons | -.1074437 .0551099 -1.95 0.054 -.2167668 .0018795
--- deltalintrend2005 --- N = 103, R² = 0.0133
ntrgap | -.2414053 .1675180 -1.44 0.153 -.5737160 .0909053
_cons | -.1314196 .0581374 -2.26 0.026 -.2467485 -.0160907
library(haven)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(2001, 2002, 2004, 2005)) {
fit <- lm(as.formula(paste0("deltalintrend", yr, " ~ ntrgap")), data = df)
print(coeftest(fit, vcov = vcovHC(fit, type = "HC2")))
}--- deltalintrend2001 --- N = 103, R² = 0.00005
(Intercept) -0.0193030 0.0139040 -1.3883 0.1681
ntrgap -0.0031404 0.0390586 -0.0804 0.9361
--- deltalintrend2002 --- N = 103, R² = 0.03349
(Intercept) -0.0654203 0.0242514 -2.6976 0.0082
ntrgap -0.1437332 0.0754461 -1.9051 0.0596
--- deltalintrend2004 --- N = 103, R² = 0.02615
(Intercept) -0.1074437 0.0551099 -1.9496 0.0540
ntrgap -0.3074995 0.1518755 -2.0247 0.0455
--- deltalintrend2005 --- N = 103, R² = 0.01326
(Intercept) -0.1314196 0.0581374 -2.2605 0.0259
ntrgap -0.2414053 0.1675180 -1.4411 0.1527
import pandas as pd
import statsmodels.api as sm
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [2001, 2002, 2004, 2005]:
y = df[f'deltalintrend{year}']
X = sm.add_constant(df['ntrgap'])
res = sm.OLS(y, X).fit(cov_type='HC2')
print(f"--- deltalintrend{year} --- N = {int(res.nobs)}, R² = {res.rsquared:.5f}")
for name in res.params.index:
print(f" {name:<20s} {res.params[name]:>12.7f} {res.bse[name]:>12.7f} {res.tvalues[name]:>8.3f} {res.pvalues[name]:>8.4f}")
print()--- deltalintrend2001 --- N = 103, R² = 0.00005
const -0.0193030 0.0139040 -1.388 0.1650
ntrgap -0.0031404 0.0390586 -0.080 0.9359
--- deltalintrend2002 --- N = 103, R² = 0.03349
const -0.0654203 0.0242514 -2.698 0.0070
ntrgap -0.1437332 0.0754461 -1.905 0.0568
--- deltalintrend2004 --- N = 103, R² = 0.02615
const -0.1074437 0.0551099 -1.950 0.0512
ntrgap -0.3074995 0.1518755 -2.025 0.0429
--- deltalintrend2005 --- N = 103, R² = 0.01326
const -0.1314196 0.0581374 -2.261 0.0238
ntrgap -0.2414053 0.1675180 -1.441 0.1496
Interpretation: After including linear trends, the effects are substantially attenuated compared to GQ1. Only 2004 is significant at 5% (\(\hat{\beta} = -0.3075\), \(p = 0.046\)). The 2005 effect is no longer significant (\(p = 0.153\)). The linear trends absorb part of the treatment effect, consistent with the treatment effect being partially confounded by pre-existing differential trends.
5.11 Stute Tests of Linearity — With Linear Trends (extra)
Test linearity of \(E[\tilde{\Delta} Y | D]\) for the detrended post-treatment outcomes.
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
stute_test deltalintrend2001 ntrgap, seed(1)
stute_test deltalintrend2002 ntrgap, seed(1)
stute_test deltalintrend2004 ntrgap, seed(1)
stute_test deltalintrend2005 ntrgap, seed(1)
preserve
reshape long delta deltalintrend, i(indusid) j(year)
stute_test deltalintrend ntrgap indusid year if year>=2001, seed(1)
restoreCross-sectional:
Variable | t stat p-value
deltalintrend2001 | .0002961 .378
deltalintrend2002 | .0018016 .062
deltalintrend2004 | .0050583 .384
deltalintrend2005 | .0047902 .530
Panel Joint Test:
2001 | .0002961 .378
2002 | .0018016 .062
2004 | .0050583 .384
2005 | .0047902 .530
Joint Stute test: 0.0119 (0.4020)
library(haven)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(2001, 2002, 2004, 2005))
stute_test(df = df, Y = paste0("deltalintrend", yr), D = "ntrgap", seed = 1)
# Panel joint test
stute_test(df = df_post_lt, Y = "deltalintrend", D = "ntrgap",
group = "indusid", time = "year", seed = 1)Cross-sectional:
Variable | t stat p-value
deltalintrend2001 | 0.0002961 0.388
deltalintrend2002 | 0.0018016 0.050
deltalintrend2004 | 0.0050583 0.318
deltalintrend2005 | 0.0047902 0.580
Panel Joint Test:
2001 | 0.0002961 0.386
2002 | 0.0018016 0.050
2004 | 0.0050583 0.330
2005 | 0.0047902 0.496
Joint Stute test: 0.0119 (0.3640)
import pandas as pd
import numpy as np
def stute_test_cvm(Y, D, df, order=1, seed=1, nboot=999):
y = df[Y].values.astype(float)
d = df[D].values.astype(float)
n = len(y)
X = np.column_stack([d**k for k in range(order+1)])
beta = np.linalg.lstsq(X, y, rcond=None)[0]
resid = y - X @ beta
idx = np.argsort(d)
cum_resid = np.cumsum(resid[idx]) / np.sqrt(n)
cvm_stat = np.sum(cum_resid**2) / n
rng = np.random.RandomState(seed)
count = 0
for _ in range(nboot):
wild = resid * rng.choice([-1, 1], size=n)
cum_wild = np.cumsum(wild[idx]) / np.sqrt(n)
boot_stat = np.sum(cum_wild**2) / n
if boot_stat >= cvm_stat:
count += 1
return cvm_stat, (count + 1) / (nboot + 1)
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [2001, 2002, 2004, 2005]:
stat, pv = stute_test_cvm(f'deltalintrend{year}', 'ntrgap', df, order=1, seed=1)
print(f"deltalintrend{year}: CvM = {stat:.7f}, p = {pv:.3f}")Cross-sectional:
Variable | CvM stat p-value
deltalintrend2001 | 0.0002961 0.409
deltalintrend2002 | 0.0018016 0.055
deltalintrend2004 | 0.0050583 0.373
deltalintrend2005 | 0.0047902 0.559
Panel Joint Test:
2001 | 0.0002961 0.409
2002 | 0.0018016 0.055
2004 | 0.0050583 0.373
2005 | 0.0047902 0.559
Joint | 0.0119463 0.431
Interpretation: Linearity is generally not rejected. The 2002 result is borderline (\(p \approx 0.05\)–\(0.06\)), but the joint test does not reject (\(p \approx 0.40\)). Overall, the linear specification is adequate for the detrended outcomes.
5.12 Quasi-Stayer Test
Test whether quasi-stayers exist: compute \(QS = D_{(1)} / (D_{(2)} - D_{(1)})\) where \(D_{(1)}\) and \(D_{(2)}\) are the two smallest NTR gap values.
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
sort ntrgap
display "D(1) = " ntrgap[1]
display "D(2) = " ntrgap[2]
display "QS = " ntrgap[1]/(ntrgap[2]-ntrgap[1])D(1) = 0.0202847
D(2) = 0.0235831
QS = D(1)/(D(2)-D(1)) = 6.1498877
library(haven)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
ntrgap_sorted <- sort(df$ntrgap)
d1 <- ntrgap_sorted[1]; d2 <- ntrgap_sorted[2]
QS <- d1 / (d2 - d1)D(1) = 0.02028472
D(2) = 0.02358311
QS = D(1)/(D(2)-D(1)) = 6.1498877
import pandas as pd
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
ntrgap_sorted = np.sort(df['ntrgap'].values)
d1, d2 = ntrgap_sorted[0], ntrgap_sorted[1]
qs = d1 / (d2 - d1)D(1) = 0.02028472
D(2) = 0.02358311
QS = D(1)/(D(2)-D(1)) = 6.1498877
Interpretation: \(QS = 6.15 < 19 = 1/\alpha - 1\) at \(\alpha = 0.05\), so we fail to reject the null that quasi-stayers exist. This supports the use of the HAD framework: the industry with the smallest NTR gap (0.020) can plausibly serve as a quasi-control group.
5.13 Computing Commands: contdid and did_multiplegt_stat Syntax (extra)
This section documents the syntax of the commands for estimating \(CAS(d_2)\), \(ATT\), and \(WATT\) in heterogeneous adoption designs with stayers. These commands are not applied to the Pierce & Schott data (which has no stayers), but their syntax is provided here for reference as shown in the textbook.
did_multiplegt_stat estimates the ATT (via the Average Slope, AS) and WATT (via the Weighted Average Slope, WAS) in designs with a continuous treatment and stayers. The syntax from the textbook (p. 342) is:
* ssc install did_multiplegt_stat, replace
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
did_multiplegt_stat outcome groupid timeid treatment, estimator(as was) exact_matchKey options: estimator(as was) estimates both AS and WAS; exact_match compares switchers and stayers with the same baseline treatment; estimation_method(dr) for doubly-robust estimation; order(#) for polynomial order; placebo for placebo estimates; noextrapolation to restrict to the support of stayers’ treatment.
contdid (Callaway, Goodman-Bacon & Sant’Anna, 2025) estimates the dose-response function \(d_2 \mapsto CAS(d_2)\):
library(haven); library(contdid)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
cd_res <- cont_did(
yname = "outcome", tname = "timeid", idname = "groupid",
dname = "treatment", gname = "cohort", data = df,
target_parameter = "slope", aggregation = "dose"
)
summary(cd_res)did_multiplegt_stat (de Chaisemartin, Ciccia, D’Haultfœuille, Knau & Sow, 2024) estimates ATT and WATT:
library(haven); library(DIDmultiplegtSTAT)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
result <- did_multiplegt_stat(
df = df, Y = "outcome", ID = "groupid", Time = "timeid",
D = "treatment", estimator = c("aoss", "waoss"),
estimation_method = "dr", exact_match = TRUE
)
summary(result)did_multiplegt_stat is available as a Python translation (did_multiplegt_stat.py):
import pandas as pd
from did_multiplegt_stat import did_multiplegt_stat, summary_did_multiplegt_stat
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
result = did_multiplegt_stat(
df = df, Y = "outcome", ID = "groupid", Time = "timeid",
D = "treatment", estimator = ["aoss", "waoss"],
estimation_method = "dr", exact_match = True
)
summary_did_multiplegt_stat(result)contdid is not available in Python.
Note: These commands require stayers (groups with \(D_{g,2} = 0\)). The Pierce & Schott (2016) dataset has no stayers, so these commands cannot be applied here.
5.14 CQ#39: did_had Event-Study Estimates
Using the
pierce_schott_didtextbookdataset, transform the data into a panel at the industry \(\times\) year level, with industries’ log-employment and NTR gap treatment from 1997 to 2005. Then, usedid_hadto compute four event-study estimates and three pre-trend estimates.
Stata nprobust note: The current nprobust .mo files on GitHub are compiled with Stata 18. If you are using Stata 17 or earlier, did_had will fail with the error nprobust_lp_mse_dpi() not found. To fix this, run once per session:
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
do "https://raw.githubusercontent.com/nppackages/nprobust/master/stata/nprobust_functions.do"Cross-platform note: The Stata output below uses Stata 18+ with the latest did_had and nprobust, which computes a single global bandwidth (\(h^*_G = 0.147\), \(N = 21\)) matching the R output. Stata 17 users may get different bandwidths per effect due to an older nprobust version. All implementations give the same quasi-stayer test statistic (\(T = 6.1499\), \(p = 0.1399\)).
* --- Load data ----------------------------------------------------------------
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
* --- Construct lemp for post-treatment years ----------------------------------
* The dataset has lemp1997-lemp2000 and delta2001, delta2002, delta2004, delta2005.
* delta20XX = lemp20XX - lemp2000, so lemp20XX = lemp2000 + delta20XX.
gen lemp2001 = lemp2000 + delta2001
gen lemp2002 = lemp2000 + delta2002
gen lemp2004 = lemp2000 + delta2004
gen lemp2005 = lemp2000 + delta2005
* --- Reshape to panel (industry × year) ---------------------------------------
reshape long lemp, i(indusid) j(year)
replace ntrgap = 0 if year <= 2000
* --- If using Stata 17 or earlier, compile nprobust from source (once) --------
* do "https://raw.githubusercontent.com/nppackages/nprobust/master/stata/nprobust_functions.do"
* --- Run did_had with triangular kernel ---------------------------------------
did_had lemp indusid year ntrgap, effects(4) placebo(3) kernel(triangular)Effect Estimates:
----------------------------------------------------------------------------------------------------------------
Effect Estimates QUG* Test
------------------------------------------------------------------------------------------ ---------------------
| Estimate SE LB CI UB CI N BW N in BW T p-val
-------------+---------------------------------------------------------------------------------------------------
Effect_1 | .007268 .078676 -.2385522 .0698521 103 .1465417 21 6.149888 .1398623
Effect_2 | -.133609 .1249707 -.5802821 -.0904061 103 .1465417 21 6.149888 .1398623
Effect_3 | .0350698 .593448 -1.603513 .7227608 103 .1465417 21 6.149888 .1398623
Effect_4 | -.0217339 .6140438 -1.898538 .5084691 103 .1465417 21 6.149888 .1398623
----------------------------------------------------------------------------------------------------------------
*Quasi-untreated group
Placebo Estimates:
------------------------------------------------------------------------------------------
Placebo Estimates
------------------------------------------------------------------------------------------
| Estimate SE LB CI UB CI N BW N in BW
-------------+-----------------------------------------------------------------------------
Placebo_1 | -.0266300 .1328800 -.3772700 .1436200 103 .1506800 21
Placebo_2 | -.0474400 .2568500 -.4426400 .5641900 103 .1465400 21
Placebo_3 | -.1147700 .1851600 -.3988900 .3269400 103 .1465400 21
------------------------------------------------------------------------------------------
library(haven)
library(tidyr)
library(dplyr)
library(DIDHAD) # devtools::install_github("chaisemartinPackages/did_had/R")
library(nprobust) # install.packages("nprobust")
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
# --- Construct lemp for post-treatment years ----------------------------------
df$lemp2001 <- df$lemp2000 + df$delta2001
df$lemp2002 <- df$lemp2000 + df$delta2002
df$lemp2004 <- df$lemp2000 + df$delta2004
df$lemp2005 <- df$lemp2000 + df$delta2005
# --- Reshape to panel (industry × year) ---------------------------------------
lemp_cols <- grep("^lemp[0-9]+$", names(df), value = TRUE)
df_long <- df %>%
select(indusid, ntrgap, all_of(lemp_cols)) %>%
pivot_longer(
cols = all_of(lemp_cols),
names_to = "year", names_prefix = "lemp",
values_to = "lemp"
) %>%
mutate(
year = as.integer(year),
ntrgap = ifelse(year <= 2000, 0, ntrgap)
) %>%
as.data.frame()
# --- Run did_had with triangular kernel ---------------------------------------
options(digits = 7)
result <- did_had(
df = df_long,
outcome = "lemp",
group = "indusid",
time = "year",
treatment = "ntrgap",
effects = 4,
placebo = 3,
kernel = "tri",
graph_off = TRUE
)
summary(result)Effect Estimates:
-----------------------------------------------------------------------------
Effect Estimates QUG* Test
---------------------------------------------------- ---------------
Estimate SE LB.CI UB.CI N BW N.BW T p.val
Effect_1 0.0072700 0.0787500 -0.2387000 0.0700000 103 0.1465400 21 6.1498900 0.1398600
Effect_2 -0.1336100 0.1250000 -0.5803500 -0.0903400 103 0.1465400 21 6.1498900 0.1398600
Effect_3 0.0350700 0.5934600 -1.6035300 0.7227800 103 0.1465400 21 6.1498900 0.1398600
Effect_4 -0.0217300 0.6140600 -1.8985800 0.5085100 103 0.1465400 21 6.1498900 0.1398600
*Quasi-Untreated Group
Placebo Estimates:
------------------------------------------------------------
Placebo Estimates
--------------------------------------------------
Estimate SE LB.CI UB.CI N BW N.BW
Placebo_1 -0.0266300 0.1328800 -0.3772700 0.1436200 103 0.1506800 21
Placebo_2 -0.0474400 0.2568500 -0.4426400 0.5641900 103 0.1465400 21
Placebo_3 -0.1147700 0.1851600 -0.3988900 0.3269400 103 0.1465400 21
# pip install did-had
import pandas as pd
from did_had import DidHad
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
# Build panel from wide to long
rows = []
for _, r in df.iterrows():
base = r['lemp2000']
ind = r['indusid']
ntr = r['ntrgap']
for yr in [1997, 1998, 1999]:
rows.append({'indusid': ind, 'year': yr, 'lemp': base + r[f'delta{yr}'], 'ntrgap': 0.0})
rows.append({'indusid': ind, 'year': 2000, 'lemp': base, 'ntrgap': 0.0})
for yr in [2001, 2002, 2004, 2005]:
rows.append({'indusid': ind, 'year': yr, 'lemp': base + r[f'delta{yr}'], 'ntrgap': ntr})
panel = pd.DataFrame(rows)
model = DidHad(kernel='tri')
results = model.fit(
df=panel, outcome='lemp', group='indusid',
time='year', treatment='ntrgap', effects=4, placebo=3)
print(results)===========================================================================
DID-HAD Estimation Results
===========================================================================
Number of groups: 103
Number of periods: 8
Adoption period (F): 2001
Kernel: tri
Confidence level: 95%
---------------------------------------------------------------------------
Effect Estimates QUG* Test
--------------------------------------------------- ---------------
Estimate SE LB.CI UB.CI N BW N.BW T p.val
Effect_1 0.00727 0.07875 -0.23870 0.07000 103 0.14654 20 6.14989 0.13986
Effect_2 -0.13361 0.12500 -0.58035 -0.09034 103 0.14654 20 6.14989 0.13986
Effect_4 0.03507 0.59346 -1.60353 0.72278 103 0.14654 20 6.14989 0.13986
*Quasi-Untreated Group
--------------------------------------------------------------
Placebo Estimates
----------------------------------------------------
Estimate SE LB.CI UB.CI N BW N.BW
Placebo_1 -0.02663 0.13288 -0.37727 0.14362 103 0.15068 21
Placebo_2 -0.04744 0.25685 -0.44264 0.56419 103 0.14654 20
===========================================================================
Interpretation: The quasi-stayer test is not rejected (\(T = 6.15\), \(p = 0.140\)): 21 industries with an NTR gap below 14.7 percentage points serve as quasi-controls. The R output shows the optimal bandwidth \(h^*_G = 0.147\), matching the PDF textbook. The estimated event-study effect at 2002 (Effect_2) is negative and significant in both platforms, while the remaining effects and all three pre-trends are insignificant. Confidence intervals are wide: with only 103 industries, the non-parametric estimator \(\widehat{WATT}^{qs}_{h^*_G}\) is imprecise. Pre-trends are not significant, but they are also imprecise — by extrapolating a linear trend from the upper bound of the 1999 pre-trend CI to the origin, differential trends could account for the estimated effects.