5 Chapter 7: Designs with Variation in Treatment Dose

5.1 Overview

Dataset: Pierce & Schott (2016) — pierce_schott_didtextbook.dta (103 industries, 22 variables)

This chapter applies the Heterogeneous Adoption Design (HAD) framework to study how China’s permanent normal trade relations (PNTR) status affected US manufacturing employment. The key feature: treatment is continuous (the NTR gap, ranging from 0.02 to 0.64), not binary. All 103 US industries are “treated” — there are no pure control groups, but industries with near-zero NTR gaps serve as quasi-stayers.

Packages used:

Task	Stata	R	Python
OLS + HC2 SE	`reg, vce(hc2)`	`sandwich` + `lmtest`	`statsmodels` (HC2)
Weight decomposition	`twowayfeweights` (SSC)	`TwoWayFEWeights` (CRAN)	Not available for `fdTR`
Stute linearity test	`stute_test` (GitHub)	`StuteTest` (GitHub)	Manual implementation
Treatment visualization	`panelview` (SSC)	`panelView` (CRAN)	`matplotlib` (manual)
F-test	`test` (built-in)	`car::linearHypothesis`	`scipy` + `f_test`
HAD event-study	`did_had` (SSC) + `nprobust` (GitHub) + `gtools` (SSC)	`DIDHAD` (GitHub) + `nprobust` (CRAN)	`did-had` (PyPI)

5.2 PanelView

Note: In the Pierce & Schott (2016) application, treatment is continuous (NTR gap), so the standard binary PanelView is not applicable. See the code below for reference, but plots are omitted.

* ssc install panelview, replace
copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
preserve
reshape long delta deltalintrend, i(indusid) j(year)
panelview delta ntrgap, i(indusid) t(year) type(treat) title("Treatment Dose: Pierce & Schott (2016)") legend(off)
graph export "$FIGDIR\ch07_panelview_stata.png", replace width(1200)
restore

library(panelView); library(tidyr)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData"))
df_long <- df %>% pivot_longer(cols = starts_with("delta"), names_to = "variable", values_to = "value")
df_long$year <- as.numeric(gsub("delta", "", df_long$variable))
df_long <- df_long[!is.na(df_long$year), ]
df_long$treated <- ifelse(df_long$year >= 2001, 1, 0)
png("figures/ch07_panelview_R.png", width = 1000, height = 600)
panelview(value ~ treated, data = df_long, index = c("indusid", "year"), type = "treat",
          main = "Pierce & Schott (2016)", ylab = "")
dev.off()

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
cmap = plt.cm.Blues
norm = mcolors.Normalize(vmin=0, vmax=df['ntrgap'].max() * 1.1)
# Heatmap: each row = industry (sorted by ntrgap), each column = year
# Color intensity = NTR gap (continuous treatment dose)

5.3 CQ#36: TWFE Regressions

Regress \(\Delta_{2000}^t Y_g\) on \(D_g\) (NTR gap) for \(t \in \{2001, 2002, 2004, 2005\}\), with HC2 standard errors.

copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
reg delta2001 ntrgap, vce(hc2)
reg delta2002 ntrgap, vce(hc2)
reg delta2004 ntrgap, vce(hc2)
reg delta2005 ntrgap, vce(hc2)

--- 2001 ---  N = 103, R² = 0.0237
             |             Robust HC2
   delta2001 | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
      ntrgap |  -.0612112   .0401833    -1.52   0.131    -.1409241    .0185017
       _cons |  -.0141423   .0129623    -1.09   0.278     -.039856    .0115714

--- 2002 ---  N = 103, R² = 0.1263
      ntrgap |  -.2598747   .0754512    -3.44   0.001    -.4095496   -.1101999
       _cons |  -.0550989   .0220561    -2.50   0.014    -.0988522   -.0113456

--- 2004 ---  N = 103, R² = 0.0857
      ntrgap |  -.5397825   .1527447    -3.53   0.001    -.8427868   -.2367782
       _cons |  -.0868008   .0524821    -1.65   0.101    -.1909111    .0173095

--- 2005 ---  N = 103, R² = 0.0707
      ntrgap |  -.5317591   .1666765    -3.19   0.002    -.8624004   -.2011179
       _cons |  -.1056161   .0536739    -1.97   0.052    -.2120906    .0008584

library(haven); library(sandwich); library(lmtest)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(2001, 2002, 2004, 2005)) {
    fit <- lm(as.formula(paste0("delta", yr, " ~ ntrgap")), data = df)
    print(coeftest(fit, vcov = vcovHC(fit, type = "HC2")))
}

--- 2001 ---  N = 103, R² = 0.02366
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0141423  0.0129623 -1.0910   0.2779
ntrgap      -0.0612112  0.0401833 -1.5233   0.1308

--- 2002 ---  N = 103, R² = 0.12631
(Intercept) -0.0550989  0.0220561 -2.4981   0.0141
ntrgap      -0.2598747  0.0754512 -3.4443   0.0008

--- 2004 ---  N = 103, R² = 0.08567
(Intercept) -0.0868008  0.0524821 -1.6539   0.1012
ntrgap      -0.5397825  0.1527447 -3.5339   0.0006

--- 2005 ---  N = 103, R² = 0.07072
(Intercept) -0.1056161  0.0536739 -1.9677   0.0518
ntrgap      -0.5317591  0.1666765 -3.1904   0.0019

import pandas as pd
import statsmodels.api as sm
df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [2001, 2002, 2004, 2005]:
    y = df[f'delta{year}']
    X = sm.add_constant(df['ntrgap'])
    res = sm.OLS(y, X).fit(cov_type='HC2')

--- 2001 ---  N = 103, R² = 0.02366
  Variable                     Coef      Std.Err          t      P>|t|
  const                  -0.0141423    0.0129623     -1.091     0.2753
  ntrgap                 -0.0612112    0.0401833     -1.523     0.1277

--- 2002 ---  N = 103, R² = 0.12631
  const                  -0.0550989    0.0220561     -2.498     0.0125
  ntrgap                 -0.2598747    0.0754512     -3.444     0.0006

--- 2004 ---  N = 103, R² = 0.08567
  const                  -0.0868008    0.0524821     -1.654     0.0981
  ntrgap                 -0.5397825    0.1527447     -3.534     0.0004

--- 2005 ---  N = 103, R² = 0.07072
  const                  -0.1056161    0.0536739     -1.968     0.0491
  ntrgap                 -0.5317591    0.1666765     -3.190     0.0014

Interpretation: The NTR gap has a significant negative effect on manufacturing employment growth starting from 2002. The effect grows over time: a 10 pp increase in the NTR gap is associated with roughly 5 pp lower employment growth by 2004–2005.

5.4 CQ#37: Decompose TWFE Weights

Use twowayfeweights with type(fdTR) to assess whether \(\hat{\beta}_{fe}\) estimates a convex combination of effects.

* ssc install twowayfeweights, replace
copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
twowayfeweights delta2001 indusid cons ntrgap ntrgap, type(fdTR)

Under the common trends assumption,
the TWFE coefficient beta, equal to -0.0612, estimates a weighted sum of 103 ATTs.
62 ATTs receive a positive weight, and 41 receive a negative weight.
------------------------------------------------
Treat. var: ntrgap      # ATTs      Σ weights
------------------------------------------------
Positive weights        62          1.3190
Negative weights        41          -0.3190
------------------------------------------------
Total                   103         1.0000
------------------------------------------------

library(haven); library(TwoWayFEWeights)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
# Workaround: fdTR with constant time variable crashes in R.
# Create a 2-period panel and use feTR (equivalent with T=2).
df_panel <- rbind(
    data.frame(indusid = df$indusid, time = 1, Y = 0, D = 0),
    data.frame(indusid = df$indusid, time = 2, Y = df$delta2001, D = df$ntrgap)
)
gq2 <- twowayfeweights(df_panel, "Y", "indusid", "time", "D",
                        type = "feTR", summary_measures = TRUE)

Under the common trends assumption,
the TWFE coefficient beta, equal to -0.0612, estimates a weighted sum of 103 ATTs.
62 ATTs receive a positive weight, and 41 receive a negative weight.

Treat. var: D       ATTs    Σ weights
Positive weights      62        1.319
Negative weights      41       -0.319
Total                103            1

Summary Measures:
  min σ(Δ) compatible with β_fe and Δ_TR = 0: 0.0317
  min σ(Δ) compatible with opposite sign: 0.0642

import pandas as pd
from twowayfeweights import twowayfeweights
df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
df["cons"] = 1
df["ntrgap_D0"] = df["ntrgap"]
result = twowayfeweights(df, "delta2001", "indusid", "cons", "ntrgap",
                          type="fdTR", D0="ntrgap_D0")

Under the common trends assumption,
beta estimates a weighted sum of 103 ATTs.
62 ATTs receive a positive weight, and 41 receive a negative weight.
------------------------------------------------
Treat. var: ntrgap      # ATTs      Σ weights
------------------------------------------------
Positive weights        62          1.3190
Negative weights        41          -0.3190
------------------------------------------------
Total                   103         1.0000
------------------------------------------------

Interpretation: 41 out of 103 industry ATTs receive negative weights, summing to −0.3190. The TWFE estimator does not estimate a convex combination of effects. However, the textbook notes this is expected in the HAD framework and motivates the Stute linearity tests below.

5.5 CQ#38: Test Random Assignment of Treatment Dose

Regress the NTR gap on pre-treatment employment levels (1997–2000) and test their joint significance.

copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
reg ntrgap lemp1997 lemp1998 lemp1999 lemp2000, vce(hc2)
test lemp1997 lemp1998 lemp1999 lemp2000

             |             Robust HC2
      ntrgap | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
    lemp1997 |  -.0219594   .1325814    -0.17   0.869    -.2850629    .2411441
    lemp1998 |   .6043265   .3103595     1.95   0.054    -.0115719    1.220225
    lemp1999 |  -.2260641   .4573776    -0.49   0.622    -1.133715    .6815867
    lemp2000 |  -.3418326   .2938878    -1.16   0.248    -.9250434    .2413782
       _cons |   .1204804   .1396417     0.86   0.390     -.156634    .3975949

F(4, 98) = 2.39     Prob > F = 0.0560

library(haven); library(car)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
m3 <- lm(ntrgap ~ lemp1997 + lemp1998 + lemp1999 + lemp2000, data = df)
coeftest(m3, vcov = vcovHC(m3, type = "HC2"))
linearHypothesis(m3, c("lemp1997=0","lemp1998=0","lemp1999=0","lemp2000=0"),
                 vcov. = vcovHC(m3, type = "HC2"))

             Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.1204804  0.1396417  0.8628   0.3904
lemp1997    -0.0219594  0.1325814 -0.1656   0.8688
lemp1998     0.6043265  0.3103595  1.9472   0.0544
lemp1999    -0.2260641  0.4573776 -0.4943   0.6222
lemp2000    -0.3418326  0.2938878 -1.1631   0.2476

F(4, 98) = 2.3901     Pr(>F) = 0.05597

import pandas as pd
df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
X_vars = ['lemp1997', 'lemp1998', 'lemp1999', 'lemp2000']
y = df['ntrgap']; X = sm.add_constant(df[X_vars])
res = sm.OLS(y, X).fit(cov_type='HC2')
# F-test for joint significance

  Variable                     Coef      Std.Err          t      P>|t|
  const                   0.1204804    0.1396417      0.863     0.3883
  lemp1997               -0.0219594    0.1325814     -0.166     0.8684
  lemp1998                0.6043265    0.3103595      1.947     0.0515
  lemp1999               -0.2260641    0.4573776     -0.494     0.6211
  lemp2000               -0.3418326    0.2938878     -1.163     0.2448

F(4, 98) = 2.3901     Prob > F = 0.0560

Interpretation: The joint F-test yields \(p = 0.056\), marginally insignificant at 5%. Pre-treatment employment levels are not strong predictors of the NTR gap, consistent with (approximate) random assignment of the treatment dose.

5.6 Stute Tests of Linearity (extra)

Test whether \(E[\Delta Y | D]\) is linear in \(D\) using the Stute (1997) Cramér–von Mises test, for each post-treatment year and jointly.

copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
stute_test delta2001 ntrgap, seed(1)
stute_test delta2002 ntrgap, seed(1)
stute_test delta2004 ntrgap, seed(1)
stute_test delta2005 ntrgap, seed(1)

preserve
reshape long delta deltalintrend, i(indusid) j(year)
stute_test delta ntrgap indusid year if year>=2001, seed(1)
restore

Cross-sectional:
   Variable       |   t stat    p-value
   delta2001      |  .0004426      .042
   delta2002      |  .0011319      .134
   delta2004      |  .0039215      .478
   delta2005      |  .0033582      .716

Panel Joint Test:
             |    t stat    p-value
        2001 |  .0004426       .042
        2002 |  .0011319       .134
        2004 |  .0039215       .478
        2005 |  .0033582       .716
Joint Stute test: 0.0089 (0.5400)

library(haven); library(StuteTest)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(2001, 2002, 2004, 2005))
    stute_test(df = df, Y = paste0("delta", yr), D = "ntrgap", seed = 1)

# Panel joint test
df_panel_gq4 <- df_long %>% filter(year >= 2001)
stute_test(df = df_panel_gq4, Y = "delta", D = "ntrgap",
           group = "indusid", time = "year", seed = 1)

Cross-sectional:
   Variable       |   t stat    p-value
   delta2001      |  0.0004426    0.054
   delta2002      |  0.0011319    0.152
   delta2004      |  0.0039215    0.480
   delta2005      |  0.0033582    0.684

Panel Joint Test:
             |    t stat    p-value
        2001 |  0.0004426    0.064
        2002 |  0.0011319    0.130
        2004 |  0.0039215    0.484
        2005 |  0.0033582    0.660
Joint Stute test: 0.0089 (0.5500)

import pandas as pd
import numpy as np

def stute_test_cvm(Y, D, df, order=1, seed=1, nboot=999):
    y = df[Y].values.astype(float)
    d = df[D].values.astype(float)
    n = len(y)
    X = np.column_stack([d**k for k in range(order+1)])
    beta = np.linalg.lstsq(X, y, rcond=None)[0]
    resid = y - X @ beta
    idx = np.argsort(d)
    resid_sorted = resid[idx]
    cum_resid = np.cumsum(resid_sorted) / np.sqrt(n)
    cvm_stat = np.sum(cum_resid**2) / n
    rng = np.random.RandomState(seed)
    count = 0
    for _ in range(nboot):
        wild = resid * rng.choice([-1, 1], size=n)
        wild_sorted = wild[idx]
        cum_wild = np.cumsum(wild_sorted) / np.sqrt(n)
        boot_stat = np.sum(cum_wild**2) / n
        if boot_stat >= cvm_stat:
            count += 1
    pval = (count + 1) / (nboot + 1)
    return cvm_stat, pval

df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [2001, 2002, 2004, 2005]:
    stat, pv = stute_test_cvm(f'delta{year}', 'ntrgap', df, order=1, seed=1)
    print(f"delta{year}: CvM = {stat:.7f}, p = {pv:.3f}")

Cross-sectional:
   Variable        |   CvM stat    p-value
   delta2001        |  0.0004426      0.695
   delta2002        |  0.0011319      0.740
   delta2004        |  0.0039215      0.925
   delta2005        |  0.0033582      0.972

Panel Joint Test:
        2001 |  0.0004426      0.695
        2002 |  0.0011319      0.740
        2004 |  0.0039215      0.925
        2005 |  0.0033582      0.972
   Joint        |  0.0088542

Interpretation: Linearity is not rejected at conventional levels for 2002–2005 (all \(p > 0.10\)). For 2001, the p-value is borderline (\(\approx 0.04\)–\(0.08\) depending on bootstrap draw). The joint test (\(p \approx 0.54\)) does not reject linearity.

5.7 Pre-Trend Tests — Without Linear Trends (extra)

Test for pre-trends by regressing \(\Delta_{2000}^t Y_g\) on \(D_g\) for \(t \in \{1999, 1998, 1997\}\).

copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
reg delta1999 ntrgap, vce(hc2)
reg delta1998 ntrgap, vce(hc2)
reg delta1997 ntrgap, vce(hc2)

--- delta1999 ---  N = 103, R² = 0.0325
      ntrgap |   .0580708   .0279591     2.08   0.040     .0026074    .1135341
       _cons |  -.0051607   .0086782    -0.59   0.553    -.0223759    .0120544

--- delta1998 ---  N = 103, R² = 0.0616
      ntrgap |   .1407052   .0547983     2.57   0.012     .0320002    .2494102
       _cons |  -.0107783   .0162652    -0.66   0.509    -.0430442    .0214875

--- delta1997 ---  N = 103, R² = 0.0307
      ntrgap |   .1625066   .0712845     2.28   0.025     .0210973    .3039160
       _cons |  -.0353406   .0211379    -1.67   0.098    -.0772726    .0065913

library(haven)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(1999, 1998, 1997)) {
    fit <- lm(as.formula(paste0("delta", yr, " ~ ntrgap")), data = df)
    print(coeftest(fit, vcov = vcovHC(fit, type = "HC2")))
}

--- delta1999 ---  N = 103, R² = 0.03251
(Intercept) -0.0051607  0.0086782 -0.5947   0.5534
ntrgap       0.0580708  0.0279591  2.0770   0.0403

--- delta1998 ---  N = 103, R² = 0.06156
(Intercept) -0.0107783  0.0162652 -0.6627   0.5091
ntrgap       0.1407052  0.0547983  2.5677   0.0117

--- delta1997 ---  N = 103, R² = 0.03068
(Intercept) -0.0353406  0.0211379 -1.6719   0.0976
ntrgap       0.1625066  0.0712845  2.2797   0.0247

import pandas as pd
df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [1999, 1998, 1997]:
    y = df[f'delta{year}']
    X = sm.add_constant(df['ntrgap'])
    res = sm.OLS(y, X).fit(cov_type='HC2')

--- delta1999 ---  N = 103, R² = 0.03251
  const                  -0.0051607    0.0086782     -0.595     0.5521
  ntrgap                  0.0580708    0.0279591      2.077     0.0378

--- delta1998 ---  N = 103, R² = 0.06156
  const                  -0.0107783    0.0162652     -0.663     0.5075
  ntrgap                  0.1407052    0.0547983      2.568     0.0102

--- delta1997 ---  N = 103, R² = 0.03068
  const                  -0.0353406    0.0211379     -1.672     0.0945
  ntrgap                  0.1625066    0.0712845      2.280     0.0226

Interpretation: All three pre-treatment coefficients are significant and positive, indicating pre-trends: industries with higher NTR gaps had faster employment growth before PNTR. This motivates adding industry-specific linear trends (GQ6–GQ9).

5.8 Pre-Trend Tests — With Linear Trends (extra)

Repeat GQ5 using the detrended outcome \(\tilde{\Delta}_{2000}^t Y_g\) (netting out industry-specific linear trends).

copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
reg deltalintrend1998 ntrgap, vce(hc2)
reg deltalintrend1997 ntrgap, vce(hc2)

--- deltalintrend1998 ---  N = 103, R² = 0.0046
      ntrgap |  -.0245637   .0339253    -0.72   0.471    -.0918623    .0427349
       _cons |   .0004569   .0119279     0.04   0.970    -.0232047    .0241186

--- deltalintrend1997 ---  N = 103, R² = 0.0050
      ntrgap |  -.0463651   .0444009    -1.04   0.299    -.1344446    .0417144
       _cons |   .0250192   .0153636     1.63   0.107    -.0054579    .0554964

library(haven)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(1998, 1997)) {
    fit <- lm(as.formula(paste0("deltalintrend", yr, " ~ ntrgap")), data = df)
    print(coeftest(fit, vcov = vcovHC(fit, type = "HC2")))
}

--- deltalintrend1998 ---  N = 103, R² = 0.00455
(Intercept)  0.0004569  0.0119279  0.0383   0.9695
ntrgap      -0.0245637  0.0339253 -0.7241   0.4707

--- deltalintrend1997 ---  N = 103, R² = 0.00503
(Intercept)  0.0250192  0.0153636  1.6285   0.1065
ntrgap      -0.0463651  0.0444009 -1.0442   0.2989

import pandas as pd
df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [1998, 1997]:
    y = df[f'deltalintrend{year}']
    X = sm.add_constant(df['ntrgap'])
    res = sm.OLS(y, X).fit(cov_type='HC2')

--- deltalintrend1998 ---  N = 103, R² = 0.00455
  const                   0.0004569    0.0119279      0.038     0.9694
  ntrgap                 -0.0245637    0.0339253     -0.724     0.4690

--- deltalintrend1997 ---  N = 103, R² = 0.00503
  const                   0.0250192    0.0153636      1.628     0.1034
  ntrgap                 -0.0463651    0.0444009     -1.044     0.2964

Interpretation: After including industry-specific linear trends, all pre-trend coefficients become small and insignificant (\(p > 0.29\)). The trends successfully absorb the pre-existing differential growth.

5.9 Stute Tests of Mean Independence (extra)

Test whether \(E[\tilde{\Delta} Y | D] = \text{const}\) (mean independence, order 0) for the detrended pre-period outcomes.

copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
stute_test deltalintrend1998 ntrgap, order(0) seed(1)
stute_test deltalintrend1997 ntrgap, order(0) seed(1)

preserve
reshape long delta deltalintrend, i(indusid) j(year)
stute_test deltalintrend ntrgap indusid year if inlist(year, 1997, 1998), order(0) seed(1)
restore

Cross-sectional:
   Variable            |   t stat    p-value
   deltalintrend1998   |  .0004249      .302
   deltalintrend1997   |  .0007520      .508

Panel Joint Test:
        1997 |   .0007520      .508
        1998 |  .0004249       .302
Joint Stute test: 0.0012 (0.4720)

library(haven)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
stute_test(df = df, Y = "deltalintrend1998", D = "ntrgap", order = 0, seed = 1)
stute_test(df = df, Y = "deltalintrend1997", D = "ntrgap", order = 0, seed = 1)
# Panel joint test
stute_test(df = df_pre_lt, Y = "deltalintrend", D = "ntrgap",
           group = "indusid", time = "year", order = 0, seed = 1)

Cross-sectional:
   Variable            |   t stat    p-value
   deltalintrend1998   |  0.0004249    0.284
   deltalintrend1997   |  0.0007520    0.506

Panel Joint Test:
        1997 |  0.0007520    0.514
        1998 |  0.0004249    0.294
Joint Stute test: 0.0012 (0.4660)

import pandas as pd
import numpy as np

def stute_test_cvm(Y, D, df, order=1, seed=1, nboot=999):
    y = df[Y].values.astype(float)
    d = df[D].values.astype(float)
    n = len(y)
    X = np.column_stack([d**k for k in range(order+1)])
    beta = np.linalg.lstsq(X, y, rcond=None)[0]
    resid = y - X @ beta
    idx = np.argsort(d)
    cum_resid = np.cumsum(resid[idx]) / np.sqrt(n)
    cvm_stat = np.sum(cum_resid**2) / n
    rng = np.random.RandomState(seed)
    count = 0
    for _ in range(nboot):
        wild = resid * rng.choice([-1, 1], size=n)
        cum_wild = np.cumsum(wild[idx]) / np.sqrt(n)
        boot_stat = np.sum(cum_wild**2) / n
        if boot_stat >= cvm_stat:
            count += 1
    return cvm_stat, (count + 1) / (nboot + 1)

df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [1998, 1997]:
    stat, pv = stute_test_cvm(f'deltalintrend{year}', 'ntrgap', df, order=0, seed=1)
    print(f"deltalintrend{year}: CvM = {stat:.7f}, p = {pv:.3f}")

Cross-sectional:
   Variable            |   CvM stat    p-value
   deltalintrend1998   |  0.0004249      0.334
   deltalintrend1997   |  0.0007520      0.566

Panel Joint Test:
        1997 |  0.0007520      0.566
        1998 |  0.0004249      0.334
Joint        |  0.0011769      0.530

Interpretation: Mean independence is not rejected for either year (\(p > 0.28\)) or jointly (\(p \approx 0.47\)). The detrended pre-period outcomes are uncorrelated with the treatment dose, supporting the identifying assumptions.

5.10 TWFE With Industry-Specific Linear Trends (extra)

Regress the detrended outcome \(\tilde{\Delta}_{2000}^t Y_g\) on \(D_g\) for post-treatment years.

copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
reg deltalintrend2001 ntrgap, vce(hc2)
reg deltalintrend2002 ntrgap, vce(hc2)
reg deltalintrend2004 ntrgap, vce(hc2)
reg deltalintrend2005 ntrgap, vce(hc2)

--- deltalintrend2001 ---  N = 103, R² = 0.0000
      ntrgap |  -.0031404   .0390586    -0.08   0.936    -.0806222    .0743413
       _cons |  -.0193030   .0139040    -1.39   0.168    -.0468848    .0082788

--- deltalintrend2002 ---  N = 103, R² = 0.0335
      ntrgap |  -.1437332   .0754461    -1.91   0.060    -.2933979    .0059315
       _cons |  -.0654203   .0242514    -2.70   0.008    -.1135286   -.0173120

--- deltalintrend2004 ---  N = 103, R² = 0.0262
      ntrgap |  -.3074995   .1518755    -2.02   0.046    -.6087796   -.0062194
       _cons |  -.1074437   .0551099    -1.95   0.054    -.2167668    .0018795

--- deltalintrend2005 ---  N = 103, R² = 0.0133
      ntrgap |  -.2414053   .1675180    -1.44   0.153    -.5737160    .0909053
       _cons |  -.1314196   .0581374    -2.26   0.026    -.2467485   -.0160907

library(haven)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(2001, 2002, 2004, 2005)) {
    fit <- lm(as.formula(paste0("deltalintrend", yr, " ~ ntrgap")), data = df)
    print(coeftest(fit, vcov = vcovHC(fit, type = "HC2")))
}

--- deltalintrend2001 ---  N = 103, R² = 0.00005
(Intercept) -0.0193030  0.0139040 -1.3883   0.1681
ntrgap      -0.0031404  0.0390586 -0.0804   0.9361

--- deltalintrend2002 ---  N = 103, R² = 0.03349
(Intercept) -0.0654203  0.0242514 -2.6976   0.0082
ntrgap      -0.1437332  0.0754461 -1.9051   0.0596

--- deltalintrend2004 ---  N = 103, R² = 0.02615
(Intercept) -0.1074437  0.0551099 -1.9496   0.0540
ntrgap      -0.3074995  0.1518755 -2.0247   0.0455

--- deltalintrend2005 ---  N = 103, R² = 0.01326
(Intercept) -0.1314196  0.0581374 -2.2605   0.0259
ntrgap      -0.2414053  0.1675180 -1.4411   0.1527

import pandas as pd
import statsmodels.api as sm

df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [2001, 2002, 2004, 2005]:
    y = df[f'deltalintrend{year}']
    X = sm.add_constant(df['ntrgap'])
    res = sm.OLS(y, X).fit(cov_type='HC2')
    print(f"--- deltalintrend{year} ---  N = {int(res.nobs)}, R² = {res.rsquared:.5f}")
    for name in res.params.index:
        print(f"  {name:<20s} {res.params[name]:>12.7f}  {res.bse[name]:>12.7f}  {res.tvalues[name]:>8.3f}   {res.pvalues[name]:>8.4f}")
    print()

--- deltalintrend2001 ---  N = 103, R² = 0.00005
  const                  -0.0193030    0.0139040     -1.388     0.1650
  ntrgap                 -0.0031404    0.0390586     -0.080     0.9359

--- deltalintrend2002 ---  N = 103, R² = 0.03349
  const                  -0.0654203    0.0242514     -2.698     0.0070
  ntrgap                 -0.1437332    0.0754461     -1.905     0.0568

--- deltalintrend2004 ---  N = 103, R² = 0.02615
  const                  -0.1074437    0.0551099     -1.950     0.0512
  ntrgap                 -0.3074995    0.1518755     -2.025     0.0429

--- deltalintrend2005 ---  N = 103, R² = 0.01326
  const                  -0.1314196    0.0581374     -2.261     0.0238
  ntrgap                 -0.2414053    0.1675180     -1.441     0.1496

Interpretation: After including linear trends, the effects are substantially attenuated compared to GQ1. Only 2004 is significant at 5% (\(\hat{\beta} = -0.3075\), \(p = 0.046\)). The 2005 effect is no longer significant (\(p = 0.153\)). The linear trends absorb part of the treatment effect, consistent with the treatment effect being partially confounded by pre-existing differential trends.

5.11 Stute Tests of Linearity — With Linear Trends (extra)

Test linearity of \(E[\tilde{\Delta} Y | D]\) for the detrended post-treatment outcomes.

copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
stute_test deltalintrend2001 ntrgap, seed(1)
stute_test deltalintrend2002 ntrgap, seed(1)
stute_test deltalintrend2004 ntrgap, seed(1)
stute_test deltalintrend2005 ntrgap, seed(1)

preserve
reshape long delta deltalintrend, i(indusid) j(year)
stute_test deltalintrend ntrgap indusid year if year>=2001, seed(1)
restore

Cross-sectional:
   Variable            |   t stat    p-value
   deltalintrend2001   |  .0002961      .378
   deltalintrend2002   |  .0018016      .062
   deltalintrend2004   |  .0050583      .384
   deltalintrend2005   |  .0047902      .530

Panel Joint Test:
        2001 |  .0002961       .378
        2002 |  .0018016       .062
        2004 |  .0050583       .384
        2005 |  .0047902       .530
Joint Stute test: 0.0119 (0.4020)

library(haven)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
for (yr in c(2001, 2002, 2004, 2005))
    stute_test(df = df, Y = paste0("deltalintrend", yr), D = "ntrgap", seed = 1)
# Panel joint test
stute_test(df = df_post_lt, Y = "deltalintrend", D = "ntrgap",
           group = "indusid", time = "year", seed = 1)

Cross-sectional:
   Variable            |   t stat    p-value
   deltalintrend2001   |  0.0002961    0.388
   deltalintrend2002   |  0.0018016    0.050
   deltalintrend2004   |  0.0050583    0.318
   deltalintrend2005   |  0.0047902    0.580

Panel Joint Test:
        2001 |  0.0002961    0.386
        2002 |  0.0018016    0.050
        2004 |  0.0050583    0.330
        2005 |  0.0047902    0.496
Joint Stute test: 0.0119 (0.3640)

import pandas as pd
import numpy as np

def stute_test_cvm(Y, D, df, order=1, seed=1, nboot=999):
    y = df[Y].values.astype(float)
    d = df[D].values.astype(float)
    n = len(y)
    X = np.column_stack([d**k for k in range(order+1)])
    beta = np.linalg.lstsq(X, y, rcond=None)[0]
    resid = y - X @ beta
    idx = np.argsort(d)
    cum_resid = np.cumsum(resid[idx]) / np.sqrt(n)
    cvm_stat = np.sum(cum_resid**2) / n
    rng = np.random.RandomState(seed)
    count = 0
    for _ in range(nboot):
        wild = resid * rng.choice([-1, 1], size=n)
        cum_wild = np.cumsum(wild[idx]) / np.sqrt(n)
        boot_stat = np.sum(cum_wild**2) / n
        if boot_stat >= cvm_stat:
            count += 1
    return cvm_stat, (count + 1) / (nboot + 1)

df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
for year in [2001, 2002, 2004, 2005]:
    stat, pv = stute_test_cvm(f'deltalintrend{year}', 'ntrgap', df, order=1, seed=1)
    print(f"deltalintrend{year}: CvM = {stat:.7f}, p = {pv:.3f}")

Cross-sectional:
   Variable            |   CvM stat    p-value
   deltalintrend2001   |  0.0002961      0.409
   deltalintrend2002   |  0.0018016      0.055
   deltalintrend2004   |  0.0050583      0.373
   deltalintrend2005   |  0.0047902      0.559

Panel Joint Test:
        2001 |  0.0002961      0.409
        2002 |  0.0018016      0.055
        2004 |  0.0050583      0.373
        2005 |  0.0047902      0.559
Joint        |  0.0119463      0.431

Interpretation: Linearity is generally not rejected. The 2002 result is borderline (\(p \approx 0.05\)–\(0.06\)), but the joint test does not reject (\(p \approx 0.40\)). Overall, the linear specification is adequate for the detrended outcomes.

5.12 Quasi-Stayer Test

Test whether quasi-stayers exist: compute \(QS = D_{(1)} / (D_{(2)} - D_{(1)})\) where \(D_{(1)}\) and \(D_{(2)}\) are the two smallest NTR gap values.

copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
sort ntrgap
display "D(1) = " ntrgap[1]
display "D(2) = " ntrgap[2]
display "QS = " ntrgap[1]/(ntrgap[2]-ntrgap[1])

D(1) = 0.0202847
D(2) = 0.0235831
QS = D(1)/(D(2)-D(1)) = 6.1498877

library(haven)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
ntrgap_sorted <- sort(df$ntrgap)
d1 <- ntrgap_sorted[1]; d2 <- ntrgap_sorted[2]
QS <- d1 / (d2 - d1)

D(1) = 0.02028472
D(2) = 0.02358311
QS = D(1)/(D(2)-D(1)) = 6.1498877

import pandas as pd
df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
ntrgap_sorted = np.sort(df['ntrgap'].values)
d1, d2 = ntrgap_sorted[0], ntrgap_sorted[1]
qs = d1 / (d2 - d1)

D(1) = 0.02028472
D(2) = 0.02358311
QS = D(1)/(D(2)-D(1)) = 6.1498877

Interpretation: \(QS = 6.15 < 19 = 1/\alpha - 1\) at \(\alpha = 0.05\), so we fail to reject the null that quasi-stayers exist. This supports the use of the HAD framework: the industry with the smallest NTR gap (0.020) can plausibly serve as a quasi-control group.

5.13 Computing Commands: `contdid` and `did_multiplegt_stat` Syntax (extra)

This section documents the syntax of the commands for estimating \(CAS(d_2)\), \(ATT\), and \(WATT\) in heterogeneous adoption designs with stayers. These commands are not applied to the Pierce & Schott data (which has no stayers), but their syntax is provided here for reference as shown in the textbook.

did_multiplegt_stat estimates the ATT (via the Average Slope, AS) and WATT (via the Weighted Average Slope, WAS) in designs with a continuous treatment and stayers. The syntax from the textbook (p. 342) is:

* ssc install did_multiplegt_stat, replace
copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
did_multiplegt_stat outcome groupid timeid treatment, estimator(as was) exact_match

Key options: estimator(as was) estimates both AS and WAS; exact_match compares switchers and stayers with the same baseline treatment; estimation_method(dr) for doubly-robust estimation; order(#) for polynomial order; placebo for placebo estimates; noextrapolation to restrict to the support of stayers’ treatment.

contdid (Callaway, Goodman-Bacon & Sant’Anna, 2025) estimates the dose-response function \(d_2 \mapsto CAS(d_2)\):

library(haven); library(contdid)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
cd_res <- cont_did(
  yname = "outcome", tname = "timeid", idname = "groupid",
  dname = "treatment", gname = "cohort", data = df,
  target_parameter = "slope", aggregation = "dose"
)
summary(cd_res)

did_multiplegt_stat (de Chaisemartin, Ciccia, D’Haultfœuille, Knau & Sow, 2024) estimates ATT and WATT:

library(haven); library(DIDmultiplegtSTAT)
load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)
result <- did_multiplegt_stat(
  df = df, Y = "outcome", ID = "groupid", Time = "timeid",
  D = "treatment", estimator = c("aoss", "waoss"),
  estimation_method = "dr", exact_match = TRUE
)
summary(result)

did_multiplegt_stat is available as a Python translation (did_multiplegt_stat.py):

import pandas as pd
from did_multiplegt_stat import did_multiplegt_stat, summary_did_multiplegt_stat
df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")
result = did_multiplegt_stat(
    df = df, Y = "outcome", ID = "groupid", Time = "timeid",
    D = "treatment", estimator = ["aoss", "waoss"],
    estimation_method = "dr", exact_match = True
)
summary_did_multiplegt_stat(result)

contdid is not available in Python.

Note: These commands require stayers (groups with \(D_{g,2} = 0\)). The Pierce & Schott (2016) dataset has no stayers, so these commands cannot be applied here.

5.14 CQ#39: `did_had` Event-Study Estimates

Using the pierce_schott_didtextbook dataset, transform the data into a panel at the industry \(\times\) year level, with industries’ log-employment and NTR gap treatment from 1997 to 2005. Then, use did_had to compute four event-study estimates and three pre-trend estimates.

Stata nprobust note: The current nprobust .mo files on GitHub are compiled with Stata 18. If you are using Stata 17 or earlier, did_had will fail with the error nprobust_lp_mse_dpi() not found. To fix this, run once per session:

copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear
do "https://raw.githubusercontent.com/nppackages/nprobust/master/stata/nprobust_functions.do"

Cross-platform note: The Stata output below uses Stata 18+ with the latest did_had and nprobust, which computes a single global bandwidth (\(h^*_G = 0.147\), \(N = 21\)) matching the R output. Stata 17 users may get different bandwidths per effect due to an older nprobust version. All implementations give the same quasi-stayer test statistic (\(T = 6.1499\), \(p = 0.1399\)).

* --- Load data ----------------------------------------------------------------
copy "https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.dta" "pierce_schott_didtextbook.dta", replace
use "pierce_schott_didtextbook.dta", clear

* --- Construct lemp for post-treatment years ----------------------------------
* The dataset has lemp1997-lemp2000 and delta2001, delta2002, delta2004, delta2005.
* delta20XX = lemp20XX - lemp2000, so lemp20XX = lemp2000 + delta20XX.
gen lemp2001 = lemp2000 + delta2001
gen lemp2002 = lemp2000 + delta2002
gen lemp2004 = lemp2000 + delta2004
gen lemp2005 = lemp2000 + delta2005

* --- Reshape to panel (industry × year) ---------------------------------------
reshape long lemp, i(indusid) j(year)
replace ntrgap = 0 if year <= 2000

* --- If using Stata 17 or earlier, compile nprobust from source (once) --------
* do "https://raw.githubusercontent.com/nppackages/nprobust/master/stata/nprobust_functions.do"

* --- Run did_had with triangular kernel ---------------------------------------
did_had lemp indusid year ntrgap, effects(4) placebo(3) kernel(triangular)

Effect Estimates:

----------------------------------------------------------------------------------------------------------------
                                    Effect Estimates                                              QUG* Test
------------------------------------------------------------------------------------------ ---------------------

             |  Estimate         SE      LB CI      UB CI          N         BW    N in BW          T      p-val 
-------------+---------------------------------------------------------------------------------------------------
    Effect_1 |  .007268    .078676  -.2385522   .0698521        103   .1465417         21   6.149888   .1398623 
    Effect_2 | -.133609   .1249707  -.5802821  -.0904061        103   .1465417         21   6.149888   .1398623 
    Effect_3 |  .0350698   .593448  -1.603513   .7227608        103   .1465417         21   6.149888   .1398623 
    Effect_4 | -.0217339   .6140438  -1.898538   .5084691        103   .1465417         21   6.149888   .1398623 
----------------------------------------------------------------------------------------------------------------
                                                                                          *Quasi-untreated group

Placebo Estimates:

------------------------------------------------------------------------------------------
                                    Placebo Estimates
------------------------------------------------------------------------------------------

             |  Estimate         SE      LB CI      UB CI          N         BW    N in BW 
-------------+-----------------------------------------------------------------------------
   Placebo_1 | -.0266300   .1328800  -.3772700   .1436200        103   .1506800         21 
   Placebo_2 | -.0474400   .2568500  -.4426400   .5641900        103   .1465400         21 
   Placebo_3 | -.1147700   .1851600  -.3988900   .3269400        103   .1465400         21 
------------------------------------------------------------------------------------------

library(haven)
library(tidyr)
library(dplyr)
library(DIDHAD)    # devtools::install_github("chaisemartinPackages/did_had/R")
library(nprobust)  # install.packages("nprobust")

load(url("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.RData")); df <- as.data.frame(df)

# --- Construct lemp for post-treatment years ----------------------------------
df$lemp2001 <- df$lemp2000 + df$delta2001
df$lemp2002 <- df$lemp2000 + df$delta2002
df$lemp2004 <- df$lemp2000 + df$delta2004
df$lemp2005 <- df$lemp2000 + df$delta2005

# --- Reshape to panel (industry × year) ---------------------------------------
lemp_cols <- grep("^lemp[0-9]+$", names(df), value = TRUE)

df_long <- df %>%
  select(indusid, ntrgap, all_of(lemp_cols)) %>%
  pivot_longer(
    cols = all_of(lemp_cols),
    names_to = "year", names_prefix = "lemp",
    values_to = "lemp"
  ) %>%
  mutate(
    year = as.integer(year),
    ntrgap = ifelse(year <= 2000, 0, ntrgap)
  ) %>%
  as.data.frame()

# --- Run did_had with triangular kernel ---------------------------------------
options(digits = 7)
result <- did_had(
  df         = df_long,
  outcome    = "lemp",
  group      = "indusid",
  time       = "year",
  treatment  = "ntrgap",
  effects    = 4,
  placebo    = 3,
  kernel     = "tri",
  graph_off  = TRUE
)
summary(result)

Effect Estimates:

-----------------------------------------------------------------------------
                           Effect Estimates                       QUG* Test
         ---------------------------------------------------- ---------------
         Estimate SE       LB.CI    UB.CI    N   BW      N.BW T       p.val  
Effect_1  0.0072700  0.0787500 -0.2387000  0.0700000 103 0.1465400 21   6.1498900 0.1398600
Effect_2 -0.1336100  0.1250000 -0.5803500 -0.0903400 103 0.1465400 21   6.1498900 0.1398600
Effect_3  0.0350700  0.5934600 -1.6035300  0.7227800 103 0.1465400 21   6.1498900 0.1398600
Effect_4 -0.0217300  0.6140600 -1.8985800  0.5085100 103 0.1465400 21   6.1498900 0.1398600
*Quasi-Untreated Group

Placebo Estimates:

------------------------------------------------------------
                          Placebo Estimates
          --------------------------------------------------
          Estimate    SE        LB.CI      UB.CI     N   BW        N.BW
Placebo_1 -0.0266300  0.1328800 -0.3772700  0.1436200 103 0.1506800 21  
Placebo_2 -0.0474400  0.2568500 -0.4426400  0.5641900 103 0.1465400 21  
Placebo_3 -0.1147700  0.1851600 -0.3988900  0.3269400 103 0.1465400 21

# pip install did-had
import pandas as pd
from did_had import DidHad

df = pd.read_parquet("https://raw.githubusercontent.com/Credible-Answers/did_book/main/cc_xd_didtextbook_2025_9_30/Data%20sets/Pierce%20and%20Schott%202016/pierce_schott_didtextbook.parquet")

# Build panel from wide to long
rows = []
for _, r in df.iterrows():
    base = r['lemp2000']
    ind = r['indusid']
    ntr = r['ntrgap']
    for yr in [1997, 1998, 1999]:
        rows.append({'indusid': ind, 'year': yr, 'lemp': base + r[f'delta{yr}'], 'ntrgap': 0.0})
    rows.append({'indusid': ind, 'year': 2000, 'lemp': base, 'ntrgap': 0.0})
    for yr in [2001, 2002, 2004, 2005]:
        rows.append({'indusid': ind, 'year': yr, 'lemp': base + r[f'delta{yr}'], 'ntrgap': ntr})

panel = pd.DataFrame(rows)
model = DidHad(kernel='tri')
results = model.fit(
    df=panel, outcome='lemp', group='indusid',
    time='year', treatment='ntrgap', effects=4, placebo=3)
print(results)

===========================================================================
DID-HAD Estimation Results
===========================================================================
Number of groups: 103
Number of periods: 8
Adoption period (F): 2001
Kernel: tri
Confidence level: 95%

---------------------------------------------------------------------------
                          Effect Estimates                      QUG* Test
         --------------------------------------------------- ---------------
          Estimate       SE     LB.CI     UB.CI     N      BW    N.BW        T    p.val
Effect_1   0.00727  0.07875  -0.23870   0.07000   103 0.14654     20  6.14989  0.13986
Effect_2  -0.13361  0.12500  -0.58035  -0.09034   103 0.14654     20  6.14989  0.13986
Effect_4   0.03507  0.59346  -1.60353   0.72278   103 0.14654     20  6.14989  0.13986
*Quasi-Untreated Group

--------------------------------------------------------------
                           Placebo Estimates
          ----------------------------------------------------
           Estimate       SE     LB.CI     UB.CI     N      BW    N.BW
Placebo_1 -0.02663  0.13288  -0.37727   0.14362   103 0.15068     21
Placebo_2 -0.04744  0.25685  -0.44264   0.56419   103 0.14654     20
===========================================================================

Interpretation: The quasi-stayer test is not rejected (\(T = 6.15\), \(p = 0.140\)): 21 industries with an NTR gap below 14.7 percentage points serve as quasi-controls. The R output shows the optimal bandwidth \(h^*_G = 0.147\), matching the PDF textbook. The estimated event-study effect at 2002 (Effect_2) is negative and significant in both platforms, while the remaining effects and all three pre-trends are insignificant. Confidence intervals are wide: with only 103 industries, the non-parametric estimator \(\widehat{WATT}^{qs}_{h^*_G}\) is imprecise. Pre-trends are not significant, but they are also imprecise — by extrapolating a linear trend from the upper bound of the 1999 pre-trend CI to the origin, differential trends could account for the estimated effects.

5.1 Overview

5.2 PanelView

5.3 CQ#36: TWFE Regressions

5.4 CQ#37: Decompose TWFE Weights

5.5 CQ#38: Test Random Assignment of Treatment Dose

5.6 Stute Tests of Linearity (extra)

5.7 Pre-Trend Tests — Without Linear Trends (extra)

5.8 Pre-Trend Tests — With Linear Trends (extra)

5.9 Stute Tests of Mean Independence (extra)

5.10 TWFE With Industry-Specific Linear Trends (extra)

5.11 Stute Tests of Linearity — With Linear Trends (extra)

5.12 Quasi-Stayer Test

5.13 Computing Commands: contdid and did_multiplegt_stat Syntax (extra)

5.14 CQ#39: did_had Event-Study Estimates

5.13 Computing Commands: `contdid` and `did_multiplegt_stat` Syntax (extra)

5.14 CQ#39: `did_had` Event-Study Estimates