5  Chapter 7: Designs with Variation in Treatment Dose

5.1 Overview

Dataset: Pierce & Schott (2016)pierce_schott_didtextbook.dta (103 industries, 22 variables)

This chapter applies the Heterogeneous Adoption Design (HAD) framework to study how China’s permanent normal trade relations (PNTR) status affected US manufacturing employment. The key feature: treatment is continuous (the NTR gap, ranging from 0.02 to 0.64), not binary. All 103 US industries are “treated” — there are no pure control groups, but industries with near-zero NTR gaps serve as quasi-stayers.

Packages used:

Task Stata R Python
OLS + HC2 SE reg, vce(hc2) sandwich + lmtest statsmodels (HC2)
Weight decomposition twowayfeweights (SSC) TwoWayFEWeights (CRAN) Not available for fdTR
Stute linearity test stute_test (GitHub) StuteTest (GitHub) Manual implementation
Treatment visualization panelview (SSC) panelView (CRAN) matplotlib (manual)
F-test test (built-in) car::linearHypothesis scipy + f_test
HAD event-study did_had (SSC) + nprobust (GitHub) + gtools (SSC) DIDHAD (GitHub) + nprobust (CRAN) did-had (PyPI)

5.2 PanelView

Note: In the Pierce & Schott (2016) application, treatment is continuous (NTR gap), so the standard binary PanelView is not applicable. See the code below for reference, but plots are omitted.

* ssc install panelview, replace
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
preserve
reshape long delta deltalintrend, i(indusid) j(year)
panelview delta ntrgap, i(indusid) t(year) type(treat) title("Treatment Dose: Pierce & Schott (2016)") legend(off)
graph export "$FIGDIR\ch07_panelview_stata.png", replace width(1200)
restore
library(panelView); library(tidyr)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData"))
df_long <- df %>% pivot_longer(cols = starts_with("delta"), names_to = "variable", values_to = "value")
df_long$year <- as.numeric(gsub("delta", "", df_long$variable))
df_long <- df_long[!is.na(df_long$year), ]
df_long$treated <- ifelse(df_long$year >= 2001, 1, 0)
png("figures/ch07_panelview_R.png", width = 1000, height = 600)
panelview(value ~ treated, data = df_long, index = c("indusid", "year"), type = "treat",
          main = "Pierce & Schott (2016)", ylab = "")
dev.off()
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
cmap = plt.cm.Blues
norm = mcolors.Normalize(vmin=0, vmax=df['ntrgap'].max() * 1.1)
# Heatmap: each row = industry (sorted by ntrgap), each column = year
# Color intensity = NTR gap (continuous treatment dose)

5.3 CQ#36: TWFE Regressions

Regress \(\Delta_{2000}^t Y_g\) on \(D_g\) (NTR gap) for \(t \in \{2001, 2002, 2004, 2005\}\), with HC2 standard errors.

copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
reg delta2001 ntrgap, vce(hc2)
reg delta2002 ntrgap, vce(hc2)
reg delta2004 ntrgap, vce(hc2)
reg delta2005 ntrgap, vce(hc2)
--- 2001 ---  N = 103, R² = 0.0237
             |             Robust HC2
   delta2001 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      ntrgap |  -.0612112   .0401833    -1.52   0.131    -.1409241    .0185017
       _cons |  -.0141423   .0129623    -1.09   0.278     -.039856    .0115714

--- 2002 ---  N = 103, R² = 0.1263
      ntrgap |  -.2598747   .0754512    -3.44   0.001    -.4095496   -.1101999
       _cons |  -.0550989   .0220561    -2.50   0.014    -.0988522   -.0113456

--- 2004 ---  N = 103, R² = 0.0857
      ntrgap |  -.5397825   .1527447    -3.53   0.001    -.8427868   -.2367782
       _cons |  -.0868008   .0524821    -1.65   0.101    -.1909111    .0173095

--- 2005 ---  N = 103, R² = 0.0707
      ntrgap |  -.5317591   .1666765    -3.19   0.002    -.8624004   -.2011179
       _cons |  -.1056161   .0536739    -1.97   0.052    -.2120906    .0008584
library(haven); library(sandwich); library(lmtest)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(2001, 2002, 2004, 2005)) {
    fit <- lm(as.formula(paste0("delta", yr, " ~ ntrgap")), data = df)
    print(coeftest(fit, vcov = vcovHC(fit, type = "HC2")))
}
--- 2001 ---  N = 103, R² = 0.02366
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0141423  0.0129623 -1.0910   0.2779
ntrgap      -0.0612112  0.0401833 -1.5233   0.1308

--- 2002 ---  N = 103, R² = 0.12631
(Intercept) -0.0550989  0.0220561 -2.4981   0.0141
ntrgap      -0.2598747  0.0754512 -3.4443   0.0008

--- 2004 ---  N = 103, R² = 0.08567
(Intercept) -0.0868008  0.0524821 -1.6539   0.1012
ntrgap      -0.5397825  0.1527447 -3.5339   0.0006

--- 2005 ---  N = 103, R² = 0.07072
(Intercept) -0.1056161  0.0536739 -1.9677   0.0518
ntrgap      -0.5317591  0.1666765 -3.1904   0.0019
import pandas as pd
import statsmodels.api as sm
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [2001, 2002, 2004, 2005]:
    y = df[f'delta{year}']
    X = sm.add_constant(df['ntrgap'])
    res = sm.OLS(y, X).fit(cov_type='HC2')
--- 2001 ---  N = 103, R² = 0.02366
  Variable                     Coef      Std.Err          t      P>|t|
  const                  -0.0141423    0.0129623     -1.091     0.2753
  ntrgap                 -0.0612112    0.0401833     -1.523     0.1277

--- 2002 ---  N = 103, R² = 0.12631
  const                  -0.0550989    0.0220561     -2.498     0.0125
  ntrgap                 -0.2598747    0.0754512     -3.444     0.0006

--- 2004 ---  N = 103, R² = 0.08567
  const                  -0.0868008    0.0524821     -1.654     0.0981
  ntrgap                 -0.5397825    0.1527447     -3.534     0.0004

--- 2005 ---  N = 103, R² = 0.07072
  const                  -0.1056161    0.0536739     -1.968     0.0491
  ntrgap                 -0.5317591    0.1666765     -3.190     0.0014

Interpretation: The NTR gap has a significant negative effect on manufacturing employment growth starting from 2002. The effect grows over time: a 10 pp increase in the NTR gap is associated with roughly 5 pp lower employment growth by 2004–2005.


5.4 CQ#37: Decompose TWFE Weights

Use twowayfeweights with type(fdTR) to assess whether \(\hat{\beta}_{fe}\) estimates a convex combination of effects.

* ssc install twowayfeweights, replace
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
twowayfeweights delta2001 indusid cons ntrgap ntrgap, type(fdTR)
Under the common trends assumption,
the TWFE coefficient beta, equal to -0.0612, estimates a weighted sum of 103 ATTs.
62 ATTs receive a positive weight, and 41 receive a negative weight.
------------------------------------------------
Treat. var: ntrgap      # ATTs      Σ weights
------------------------------------------------
Positive weights        62          1.3190
Negative weights        41          -0.3190
------------------------------------------------
Total                   103         1.0000
------------------------------------------------
library(haven); library(TwoWayFEWeights)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
# Workaround: fdTR with constant time variable crashes in R.
# Create a 2-period panel and use feTR (equivalent with T=2).
df_panel <- rbind(
    data.frame(indusid = df$indusid, time = 1, Y = 0, D = 0),
    data.frame(indusid = df$indusid, time = 2, Y = df$delta2001, D = df$ntrgap)
)
gq2 <- twowayfeweights(df_panel, "Y", "indusid", "time", "D",
                        type = "feTR", summary_measures = TRUE)
Under the common trends assumption,
the TWFE coefficient beta, equal to -0.0612, estimates a weighted sum of 103 ATTs.
62 ATTs receive a positive weight, and 41 receive a negative weight.

Treat. var: D       ATTs    Σ weights
Positive weights      62        1.319
Negative weights      41       -0.319
Total                103            1

Summary Measures:
  min σ(Δ) compatible with β_fe and Δ_TR = 0: 0.0317
  min σ(Δ) compatible with opposite sign: 0.0642
import pandas as pd
from twowayfeweights import twowayfeweights
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
df["cons"] = 1
df["ntrgap_D0"] = df["ntrgap"]
result = twowayfeweights(df, "delta2001", "indusid", "cons", "ntrgap",
                          type="fdTR", D0="ntrgap_D0")
Under the common trends assumption,
beta estimates a weighted sum of 103 ATTs.
62 ATTs receive a positive weight, and 41 receive a negative weight.
------------------------------------------------
Treat. var: ntrgap      # ATTs      Σ weights
------------------------------------------------
Positive weights        62          1.3190
Negative weights        41          -0.3190
------------------------------------------------
Total                   103         1.0000
------------------------------------------------

Interpretation: 41 out of 103 industry ATTs receive negative weights, summing to −0.3190. The TWFE estimator does not estimate a convex combination of effects. However, the textbook notes this is expected in the HAD framework and motivates the Stute linearity tests below.


5.5 CQ#38: Test Random Assignment of Treatment Dose

Regress the NTR gap on pre-treatment employment levels (1997–2000) and test their joint significance.

copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
reg ntrgap lemp1997 lemp1998 lemp1999 lemp2000, vce(hc2)
test lemp1997 lemp1998 lemp1999 lemp2000
             |             Robust HC2
      ntrgap | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
    lemp1997 |  -.0219594   .1325814    -0.17   0.869    -.2850629    .2411441
    lemp1998 |   .6043265   .3103595     1.95   0.054    -.0115719    1.220225
    lemp1999 |  -.2260641   .4573776    -0.49   0.622    -1.133715    .6815867
    lemp2000 |  -.3418326   .2938878    -1.16   0.248    -.9250434    .2413782
       _cons |   .1204804   .1396417     0.86   0.390     -.156634    .3975949

F(4, 98) = 2.39     Prob > F = 0.0560
library(haven); library(car)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
m3 <- lm(ntrgap ~ lemp1997 + lemp1998 + lemp1999 + lemp2000, data = df)
coeftest(m3, vcov = vcovHC(m3, type = "HC2"))
linearHypothesis(m3, c("lemp1997=0","lemp1998=0","lemp1999=0","lemp2000=0"),
                 vcov. = vcovHC(m3, type = "HC2"))
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.1204804  0.1396417  0.8628   0.3904
lemp1997    -0.0219594  0.1325814 -0.1656   0.8688
lemp1998     0.6043265  0.3103595  1.9472   0.0544
lemp1999    -0.2260641  0.4573776 -0.4943   0.6222
lemp2000    -0.3418326  0.2938878 -1.1631   0.2476

F(4, 98) = 2.3901     Pr(>F) = 0.05597
import pandas as pd
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
X_vars = ['lemp1997', 'lemp1998', 'lemp1999', 'lemp2000']
y = df['ntrgap']; X = sm.add_constant(df[X_vars])
res = sm.OLS(y, X).fit(cov_type='HC2')
# F-test for joint significance
  Variable                     Coef      Std.Err          t      P>|t|
  const                   0.1204804    0.1396417      0.863     0.3883
  lemp1997               -0.0219594    0.1325814     -0.166     0.8684
  lemp1998                0.6043265    0.3103595      1.947     0.0515
  lemp1999               -0.2260641    0.4573776     -0.494     0.6211
  lemp2000               -0.3418326    0.2938878     -1.163     0.2448

F(4, 98) = 2.3901     Prob > F = 0.0560

Interpretation: The joint F-test yields \(p = 0.056\), marginally insignificant at 5%. Pre-treatment employment levels are not strong predictors of the NTR gap, consistent with (approximate) random assignment of the treatment dose.


5.6 Stute Tests of Linearity (extra)

Test whether \(E[\Delta Y | D]\) is linear in \(D\) using the Stute (1997) Cramér–von Mises test, for each post-treatment year and jointly.

copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
stute_test delta2001 ntrgap, seed(1)
stute_test delta2002 ntrgap, seed(1)
stute_test delta2004 ntrgap, seed(1)
stute_test delta2005 ntrgap, seed(1)

preserve
reshape long delta deltalintrend, i(indusid) j(year)
stute_test delta ntrgap indusid year if year>=2001, seed(1)
restore
Cross-sectional:
   Variable       |   t stat    p-value
   delta2001      |  .0004426      .042
   delta2002      |  .0011319      .134
   delta2004      |  .0039215      .478
   delta2005      |  .0033582      .716

Panel Joint Test:
             |    t stat    p-value
        2001 |  .0004426       .042
        2002 |  .0011319       .134
        2004 |  .0039215       .478
        2005 |  .0033582       .716
Joint Stute test: 0.0089 (0.5400)
library(haven); library(StuteTest)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(2001, 2002, 2004, 2005))
    stute_test(df = df, Y = paste0("delta", yr), D = "ntrgap", seed = 1)

# Panel joint test
df_panel_gq4 <- df_long %>% filter(year >= 2001)
stute_test(df = df_panel_gq4, Y = "delta", D = "ntrgap",
           group = "indusid", time = "year", seed = 1)
Cross-sectional:
   Variable       |   t stat    p-value
   delta2001      |  0.0004426    0.054
   delta2002      |  0.0011319    0.152
   delta2004      |  0.0039215    0.480
   delta2005      |  0.0033582    0.684

Panel Joint Test:
             |    t stat    p-value
        2001 |  0.0004426    0.064
        2002 |  0.0011319    0.130
        2004 |  0.0039215    0.484
        2005 |  0.0033582    0.660
Joint Stute test: 0.0089 (0.5500)
import pandas as pd
import numpy as np

def stute_test_cvm(Y, D, df, order=1, seed=1, nboot=999):
    y = df[Y].values.astype(float)
    d = df[D].values.astype(float)
    n = len(y)
    X = np.column_stack([d**k for k in range(order+1)])
    beta = np.linalg.lstsq(X, y, rcond=None)[0]
    resid = y - X @ beta
    idx = np.argsort(d)
    resid_sorted = resid[idx]
    cum_resid = np.cumsum(resid_sorted) / np.sqrt(n)
    cvm_stat = np.sum(cum_resid**2) / n
    rng = np.random.RandomState(seed)
    count = 0
    for _ in range(nboot):
        wild = resid * rng.choice([-1, 1], size=n)
        wild_sorted = wild[idx]
        cum_wild = np.cumsum(wild_sorted) / np.sqrt(n)
        boot_stat = np.sum(cum_wild**2) / n
        if boot_stat >= cvm_stat:
            count += 1
    pval = (count + 1) / (nboot + 1)
    return cvm_stat, pval

df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [2001, 2002, 2004, 2005]:
    stat, pv = stute_test_cvm(f'delta{year}', 'ntrgap', df, order=1, seed=1)
    print(f"delta{year}: CvM = {stat:.7f}, p = {pv:.3f}")
Cross-sectional:
   Variable        |   CvM stat    p-value
   delta2001        |  0.0004426      0.695
   delta2002        |  0.0011319      0.740
   delta2004        |  0.0039215      0.925
   delta2005        |  0.0033582      0.972

Panel Joint Test:
        2001 |  0.0004426      0.695
        2002 |  0.0011319      0.740
        2004 |  0.0039215      0.925
        2005 |  0.0033582      0.972
   Joint        |  0.0088542

Interpretation: Linearity is not rejected at conventional levels for 2002–2005 (all \(p > 0.10\)). For 2001, the p-value is borderline (\(\approx 0.04\)\(0.08\) depending on bootstrap draw). The joint test (\(p \approx 0.54\)) does not reject linearity.


5.9 Stute Tests of Mean Independence (extra)

Test whether \(E[\tilde{\Delta} Y | D] = \text{const}\) (mean independence, order 0) for the detrended pre-period outcomes.

copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
stute_test deltalintrend1998 ntrgap, order(0) seed(1)
stute_test deltalintrend1997 ntrgap, order(0) seed(1)

preserve
reshape long delta deltalintrend, i(indusid) j(year)
stute_test deltalintrend ntrgap indusid year if inlist(year, 1997, 1998), order(0) seed(1)
restore
Cross-sectional:
   Variable            |   t stat    p-value
   deltalintrend1998   |  .0004249      .302
   deltalintrend1997   |  .0007520      .508

Panel Joint Test:
        1997 |   .0007520      .508
        1998 |  .0004249       .302
Joint Stute test: 0.0012 (0.4720)
library(haven)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
stute_test(df = df, Y = "deltalintrend1998", D = "ntrgap", order = 0, seed = 1)
stute_test(df = df, Y = "deltalintrend1997", D = "ntrgap", order = 0, seed = 1)
# Panel joint test
stute_test(df = df_pre_lt, Y = "deltalintrend", D = "ntrgap",
           group = "indusid", time = "year", order = 0, seed = 1)
Cross-sectional:
   Variable            |   t stat    p-value
   deltalintrend1998   |  0.0004249    0.284
   deltalintrend1997   |  0.0007520    0.506

Panel Joint Test:
        1997 |  0.0007520    0.514
        1998 |  0.0004249    0.294
Joint Stute test: 0.0012 (0.4660)
import pandas as pd
import numpy as np

def stute_test_cvm(Y, D, df, order=1, seed=1, nboot=999):
    y = df[Y].values.astype(float)
    d = df[D].values.astype(float)
    n = len(y)
    X = np.column_stack([d**k for k in range(order+1)])
    beta = np.linalg.lstsq(X, y, rcond=None)[0]
    resid = y - X @ beta
    idx = np.argsort(d)
    cum_resid = np.cumsum(resid[idx]) / np.sqrt(n)
    cvm_stat = np.sum(cum_resid**2) / n
    rng = np.random.RandomState(seed)
    count = 0
    for _ in range(nboot):
        wild = resid * rng.choice([-1, 1], size=n)
        cum_wild = np.cumsum(wild[idx]) / np.sqrt(n)
        boot_stat = np.sum(cum_wild**2) / n
        if boot_stat >= cvm_stat:
            count += 1
    return cvm_stat, (count + 1) / (nboot + 1)

df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [1998, 1997]:
    stat, pv = stute_test_cvm(f'deltalintrend{year}', 'ntrgap', df, order=0, seed=1)
    print(f"deltalintrend{year}: CvM = {stat:.7f}, p = {pv:.3f}")
Cross-sectional:
   Variable            |   CvM stat    p-value
   deltalintrend1998   |  0.0004249      0.334
   deltalintrend1997   |  0.0007520      0.566

Panel Joint Test:
        1997 |  0.0007520      0.566
        1998 |  0.0004249      0.334
Joint        |  0.0011769      0.530

Interpretation: Mean independence is not rejected for either year (\(p > 0.28\)) or jointly (\(p \approx 0.47\)). The detrended pre-period outcomes are uncorrelated with the treatment dose, supporting the identifying assumptions.


5.12 Quasi-Stayer Test

Test whether quasi-stayers exist: compute \(QS = D_{(1)} / (D_{(2)} - D_{(1)})\) where \(D_{(1)}\) and \(D_{(2)}\) are the two smallest NTR gap values.

copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
sort ntrgap
display "D(1) = " ntrgap[1]
display "D(2) = " ntrgap[2]
display "QS = " ntrgap[1]/(ntrgap[2]-ntrgap[1])
D(1) = 0.0202847
D(2) = 0.0235831
QS = D(1)/(D(2)-D(1)) = 6.1498877
library(haven)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
ntrgap_sorted <- sort(df$ntrgap)
d1 <- ntrgap_sorted[1]; d2 <- ntrgap_sorted[2]
QS <- d1 / (d2 - d1)
D(1) = 0.02028472
D(2) = 0.02358311
QS = D(1)/(D(2)-D(1)) = 6.1498877
import pandas as pd
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
ntrgap_sorted = np.sort(df['ntrgap'].values)
d1, d2 = ntrgap_sorted[0], ntrgap_sorted[1]
qs = d1 / (d2 - d1)
D(1) = 0.02028472
D(2) = 0.02358311
QS = D(1)/(D(2)-D(1)) = 6.1498877

Interpretation: \(QS = 6.15 < 19 = 1/\alpha - 1\) at \(\alpha = 0.05\), so we fail to reject the null that quasi-stayers exist. This supports the use of the HAD framework: the industry with the smallest NTR gap (0.020) can plausibly serve as a quasi-control group.


5.13 Computing Commands: contdid and did_multiplegt_stat Syntax (extra)

This section documents the syntax of the commands for estimating \(CAS(d_2)\), \(ATT\), and \(WATT\) in heterogeneous adoption designs with stayers. These commands are not applied to the Pierce & Schott data (which has no stayers), but their syntax is provided here for reference as shown in the textbook.

did_multiplegt_stat estimates the ATT (via the Average Slope, AS) and WATT (via the Weighted Average Slope, WAS) in designs with a continuous treatment and stayers. The syntax from the textbook (p. 342) is:

* ssc install did_multiplegt_stat, replace
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
did_multiplegt_stat outcome groupid timeid treatment, estimator(as was) exact_match

Key options: estimator(as was) estimates both AS and WAS; exact_match compares switchers and stayers with the same baseline treatment; estimation_method(dr) for doubly-robust estimation; order(#) for polynomial order; placebo for placebo estimates; noextrapolation to restrict to the support of stayers’ treatment.

contdid (Callaway, Goodman-Bacon & Sant’Anna, 2025) estimates the dose-response function \(d_2 \mapsto CAS(d_2)\):

library(haven); library(contdid)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
cd_res <- cont_did(
  yname = "outcome", tname = "timeid", idname = "groupid",
  dname = "treatment", gname = "cohort", data = df,
  target_parameter = "slope", aggregation = "dose"
)
summary(cd_res)

did_multiplegt_stat (de Chaisemartin, Ciccia, D’Haultfœuille, Knau & Sow, 2024) estimates ATT and WATT:

library(haven); library(DIDmultiplegtSTAT)
load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
result <- did_multiplegt_stat(
  df = df, Y = "outcome", ID = "groupid", Time = "timeid",
  D = "treatment", estimator = c("aoss", "waoss"),
  estimation_method = "dr", exact_match = TRUE
)
summary(result)

did_multiplegt_stat is available as a Python translation (did_multiplegt_stat.py):

import pandas as pd
from did_multiplegt_stat import did_multiplegt_stat, summary_did_multiplegt_stat
df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
result = did_multiplegt_stat(
    df = df, Y = "outcome", ID = "groupid", Time = "timeid",
    D = "treatment", estimator = ["aoss", "waoss"],
    estimation_method = "dr", exact_match = True
)
summary_did_multiplegt_stat(result)

contdid is not available in Python.

Note: These commands require stayers (groups with \(D_{g,2} = 0\)). The Pierce & Schott (2016) dataset has no stayers, so these commands cannot be applied here.


5.14 CQ#39: did_had Event-Study Estimates

Using the pierce_schott_didtextbook dataset, transform the data into a panel at the industry \(\times\) year level, with industries’ log-employment and NTR gap treatment from 1997 to 2005. Then, use did_had to compute four event-study estimates and three pre-trend estimates.

Stata nprobust note: The current nprobust .mo files on GitHub are compiled with Stata 18. If you are using Stata 17 or earlier, did_had will fail with the error nprobust_lp_mse_dpi() not found. To fix this, run once per session:

copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
do "https://raw.githubusercontent.com/nppackages/nprobust/master/stata/nprobust_functions.do"

Cross-platform note: The Stata output below uses Stata 18+ with the latest did_had and nprobust, which computes a single global bandwidth (\(h^*_G = 0.147\), \(N = 21\)) matching the R output. Stata 17 users may get different bandwidths per effect due to an older nprobust version. All implementations give the same quasi-stayer test statistic (\(T = 6.1499\), \(p = 0.1399\)).

* --- Load data ----------------------------------------------------------------
copy "https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear

* --- Construct lemp for post-treatment years ----------------------------------
* The dataset has lemp1997-lemp2000 and delta2001, delta2002, delta2004, delta2005.
* delta20XX = lemp20XX - lemp2000, so lemp20XX = lemp2000 + delta20XX.
gen lemp2001 = lemp2000 + delta2001
gen lemp2002 = lemp2000 + delta2002
gen lemp2004 = lemp2000 + delta2004
gen lemp2005 = lemp2000 + delta2005

* --- Reshape to panel (industry × year) ---------------------------------------
reshape long lemp, i(indusid) j(year)
replace ntrgap = 0 if year <= 2000

* --- If using Stata 17 or earlier, compile nprobust from source (once) --------
* do "https://raw.githubusercontent.com/nppackages/nprobust/master/stata/nprobust_functions.do"

* --- Run did_had with triangular kernel ---------------------------------------
did_had lemp indusid year ntrgap, effects(4) placebo(3) kernel(triangular)

Effect Estimates:

----------------------------------------------------------------------------------------------------------------
                                    Effect Estimates                                              QUG* Test
------------------------------------------------------------------------------------------ ---------------------

             |  Estimate         SE      LB CI      UB CI          N         BW    N in BW          T      p-val 
-------------+---------------------------------------------------------------------------------------------------
    Effect_1 |  .007268    .078676  -.2385522   .0698521        103   .1465417         21   6.149888   .1398623 
    Effect_2 | -.133609   .1249707  -.5802821  -.0904061        103   .1465417         21   6.149888   .1398623 
    Effect_3 |  .0350698   .593448  -1.603513   .7227608        103   .1465417         21   6.149888   .1398623 
    Effect_4 | -.0217339   .6140438  -1.898538   .5084691        103   .1465417         21   6.149888   .1398623 
----------------------------------------------------------------------------------------------------------------
                                                                                          *Quasi-untreated group

Placebo Estimates:

------------------------------------------------------------------------------------------
                                    Placebo Estimates
------------------------------------------------------------------------------------------

             |  Estimate         SE      LB CI      UB CI          N         BW    N in BW 
-------------+-----------------------------------------------------------------------------
   Placebo_1 | -.0266300   .1328800  -.3772700   .1436200        103   .1506800         21 
   Placebo_2 | -.0474400   .2568500  -.4426400   .5641900        103   .1465400         21 
   Placebo_3 | -.1147700   .1851600  -.3988900   .3269400        103   .1465400         21 
------------------------------------------------------------------------------------------
library(haven)
library(tidyr)
library(dplyr)
library(DIDHAD)    # devtools::install_github("chaisemartinPackages/did_had/R")
library(nprobust)  # install.packages("nprobust")

load(url("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)

# --- Construct lemp for post-treatment years ----------------------------------
df$lemp2001 <- df$lemp2000 + df$delta2001
df$lemp2002 <- df$lemp2000 + df$delta2002
df$lemp2004 <- df$lemp2000 + df$delta2004
df$lemp2005 <- df$lemp2000 + df$delta2005

# --- Reshape to panel (industry × year) ---------------------------------------
lemp_cols <- grep("^lemp[0-9]+$", names(df), value = TRUE)

df_long <- df %>%
  select(indusid, ntrgap, all_of(lemp_cols)) %>%
  pivot_longer(
    cols = all_of(lemp_cols),
    names_to = "year", names_prefix = "lemp",
    values_to = "lemp"
  ) %>%
  mutate(
    year = as.integer(year),
    ntrgap = ifelse(year <= 2000, 0, ntrgap)
  ) %>%
  as.data.frame()

# --- Run did_had with triangular kernel ---------------------------------------
options(digits = 7)
result <- did_had(
  df         = df_long,
  outcome    = "lemp",
  group      = "indusid",
  time       = "year",
  treatment  = "ntrgap",
  effects    = 4,
  placebo    = 3,
  kernel     = "tri",
  graph_off  = TRUE
)
summary(result)

Effect Estimates:

-----------------------------------------------------------------------------
                           Effect Estimates                       QUG* Test
         ---------------------------------------------------- ---------------
         Estimate SE       LB.CI    UB.CI    N   BW      N.BW T       p.val  
Effect_1  0.0072700  0.0787500 -0.2387000  0.0700000 103 0.1465400 21   6.1498900 0.1398600
Effect_2 -0.1336100  0.1250000 -0.5803500 -0.0903400 103 0.1465400 21   6.1498900 0.1398600
Effect_3  0.0350700  0.5934600 -1.6035300  0.7227800 103 0.1465400 21   6.1498900 0.1398600
Effect_4 -0.0217300  0.6140600 -1.8985800  0.5085100 103 0.1465400 21   6.1498900 0.1398600
*Quasi-Untreated Group

Placebo Estimates:

------------------------------------------------------------
                          Placebo Estimates
          --------------------------------------------------
          Estimate    SE        LB.CI      UB.CI     N   BW        N.BW
Placebo_1 -0.0266300  0.1328800 -0.3772700  0.1436200 103 0.1506800 21  
Placebo_2 -0.0474400  0.2568500 -0.4426400  0.5641900 103 0.1465400 21  
Placebo_3 -0.1147700  0.1851600 -0.3988900  0.3269400 103 0.1465400 21  
# pip install did-had
import pandas as pd
from did_had import DidHad

df = pd.read_parquet("https://raw.githubusercontent.com/anzonyquispe/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")

# Build panel from wide to long
rows = []
for _, r in df.iterrows():
    base = r['lemp2000']
    ind = r['indusid']
    ntr = r['ntrgap']
    for yr in [1997, 1998, 1999]:
        rows.append({'indusid': ind, 'year': yr, 'lemp': base + r[f'delta{yr}'], 'ntrgap': 0.0})
    rows.append({'indusid': ind, 'year': 2000, 'lemp': base, 'ntrgap': 0.0})
    for yr in [2001, 2002, 2004, 2005]:
        rows.append({'indusid': ind, 'year': yr, 'lemp': base + r[f'delta{yr}'], 'ntrgap': ntr})

panel = pd.DataFrame(rows)
model = DidHad(kernel='tri')
results = model.fit(
    df=panel, outcome='lemp', group='indusid',
    time='year', treatment='ntrgap', effects=4, placebo=3)
print(results)
===========================================================================
DID-HAD Estimation Results
===========================================================================
Number of groups: 103
Number of periods: 8
Adoption period (F): 2001
Kernel: tri
Confidence level: 95%

---------------------------------------------------------------------------
                          Effect Estimates                      QUG* Test
         --------------------------------------------------- ---------------
          Estimate       SE     LB.CI     UB.CI     N      BW    N.BW        T    p.val
Effect_1   0.00727  0.07875  -0.23870   0.07000   103 0.14654     20  6.14989  0.13986
Effect_2  -0.13361  0.12500  -0.58035  -0.09034   103 0.14654     20  6.14989  0.13986
Effect_4   0.03507  0.59346  -1.60353   0.72278   103 0.14654     20  6.14989  0.13986
*Quasi-Untreated Group

--------------------------------------------------------------
                           Placebo Estimates
          ----------------------------------------------------
           Estimate       SE     LB.CI     UB.CI     N      BW    N.BW
Placebo_1 -0.02663  0.13288  -0.37727   0.14362   103 0.15068     21
Placebo_2 -0.04744  0.25685  -0.44264   0.56419   103 0.14654     20
===========================================================================

Interpretation: The quasi-stayer test is not rejected (\(T = 6.15\), \(p = 0.140\)): 21 industries with an NTR gap below 14.7 percentage points serve as quasi-controls. The R output shows the optimal bandwidth \(h^*_G = 0.147\), matching the PDF textbook. The estimated event-study effect at 2002 (Effect_2) is negative and significant in both platforms, while the remaining effects and all three pre-trends are insignificant. Confidence intervals are wide: with only 103 industries, the non-parametric estimator \(\widehat{WATT}^{qs}_{h^*_G}\) is imprecise. Pre-trends are not significant, but they are also imprecise — by extrapolating a linear trend from the upper bound of the 1999 pre-trend CI to the origin, differential trends could account for the estimated effects.