4 Python Package Analysis

5 Python Packages for Staggered DiD

This chapter benchmarks Python packages for difference-in-differences with staggered treatment. Python has fewer native implementations compared to R, but several community packages exist.

5.1 Package Availability Summary

Method	R Package	Python Package	Status
Callaway & Sant’Anna	`did`	`csdid`, `diff_diff`	Available
de Chaisemartin & D’Haultfoeuille	`DIDmultiplegtDYN`	`did_multiplegt_dyn`	Available
Sun & Abraham	`fixest::sunab`	`pyfixest`	Available (partial)
Borusyak, Jaravel & Spiess	`didimputation`	None	Not available in Python

import numpy as np
import pandas as pd
import time
import warnings
warnings.filterwarnings('ignore')

# Load the data
df = pd.read_csv("sim_data.csv")

print(f"Data loaded:")
print(f"Observations: {len(df):,}")
print(f"Units: {df['id'].nunique():,}")
print(f"Periods: {df['year'].nunique()}")

# Calculate true ATT (from treated observations' true effects)
# We need to regenerate true effects since CSV doesn't have them
treat_cohorts = [0, 2012, 2014, 2016, 2018]
cohort_effects = {0: 0, 2012: 0.5, 2014: 0.3, 2016: 0.1, 2018: 0.0}
tau_0 = 1.0
gamma = 0.1

df['tau_gt'] = 0.0
mask = df['treated'] == True
df.loc[mask, 'tau_gt'] = (
    tau_0 +
    df.loc[mask, 'first_treat'].map(cohort_effects) +
    gamma * (df.loc[mask, 'year'] - df.loc[mask, 'first_treat'])
)

true_overall_att = df.loc[df['treated'], 'tau_gt'].mean()
print(f"\nTrue overall ATT: {true_overall_att:.4f}")

# Store results
results = {}

Data loaded:
Observations: 100,000
Units: 10,000
Periods: 10

True overall ATT: 1.5861

5.2 1. Callaway & Sant’Anna (`csdid`)

The csdid package is a Python port of the original R did package.

try:
    from csdid.att_gt import ATTgt

    print("Running csdid ATTgt()...")
    print("This may take several minutes with 1M units.\n")

    start_time = time.perf_counter()

    att_gt = ATTgt(
        yname='y',
        gname='first_treat',
        idname='id',
        tname='year',
        data=df
    )

    out_csdid = att_gt.fit(est_method='dr')  # doubly robust

    csdid_time = time.perf_counter() - start_time

    print(f"\nExecution time: {csdid_time:.2f} seconds")

    # Aggregate to dynamic effects
    agg_dynamic = out_csdid.aggte(typec='dynamic', na_rm=True)

    # Extract overall ATT from the aggregation
    csdid_att = agg_dynamic.summ_attgt().atte['overall_att']

    print(f"\nEstimated ATT: {csdid_att:.4f}" if csdid_att else "ATT not extracted")
    print(f"True ATT: {true_overall_att:.4f}")

    results['csdid'] = {
        'package': 'csdid',
        'method': 'Callaway & Sant\'Anna',
        'time': csdid_time,
        'att': csdid_att,
        'output': out_csdid
    }

except ImportError as e:
    print("Package csdid not installed.")
    print("Install with: pip install csdid")
    results['csdid'] = {
        'package': 'csdid',
        'method': 'Callaway & Sant\'Anna',
        'time': None,
        'error': str(e)
    }
except Exception as e:
    print(f"Error running csdid: {e}")
    results['csdid'] = {
        'package': 'csdid',
        'method': 'Callaway & Sant\'Anna',
        'time': None,
        'error': str(e)
    }

Running csdid ATTgt()...
This may take several minutes with 1M units.


Execution time: 0.61 seconds


Overall summary of ATT's based on event-study/dynamic aggregation:
   ATT Std. Error [95.0%  Conf. Int.]  
1.7041     0.0201 1.6647       1.7435 *


Dynamic Effects:
    Event time  Estimate  Std. Error  [95.0% Simult.   Conf. Band   
0           -7    0.0760      0.0493          -0.0207      0.1727   
1           -6    0.0036      0.0497          -0.0939      0.1010   
2           -5    0.0034      0.0327          -0.0607      0.0674   
3           -4   -0.0378      0.0318          -0.1002      0.0245   
4           -3    0.0141      0.0250          -0.0350      0.0631   
5           -2   -0.0178      0.0240          -0.0648      0.0292   
6           -1    0.0108      0.0218          -0.0318      0.0534   
7            0    1.2330      0.0225           1.1889      1.2771  *
8            1    1.3592      0.0175           1.3250      1.3934  *
9            2    1.5132      0.0258           1.4627      1.5637  *
10           3    1.5721      0.0212           1.5305      1.6137  *
11           4    1.7740      0.0300           1.7152      1.8329  *
12           5    1.8917      0.0295           1.8339      1.9496  *
13           6    2.1220      0.0402           2.0432      2.2008  *
14           7    2.1674      0.0397           2.0896      2.2453  *
---
Signif. codes: `*' confidence band does not cover 0
Control Group:  Never Treated , 
Anticipation Periods:  0
Estimation Method:  Doubly Robust



Estimated ATT: 1.7041
True ATT: 1.5861

5.3 2. Alternative CS Implementation (`diff_diff`)

The diff_diff package provides another implementation of Callaway & Sant’Anna.

try:
    from diff_diff import CallawaySantAnna

    print("Running diff_diff CallawaySantAnna()...")
    print()

    cs = CallawaySantAnna()

    start_time = time.perf_counter()

    cs_results = cs.fit(
        df,
        outcome='y',
        unit='id',
        time='year',
        first_treat='first_treat',
        aggregate='event_study'
    )

    diff_diff_time = time.perf_counter() - start_time

    print(f"Execution time: {diff_diff_time:.2f} seconds")

    # Extract ATT from event study
    if hasattr(cs_results, 'event_study_effects'):
        post_effects = [v['effect'] for k, v in cs_results.event_study_effects.items()
                       if k >= 0]
        diff_diff_att = np.mean(post_effects) if post_effects else None
    else:
        diff_diff_att = None

    print(f"\nEstimated ATT (post-treatment avg): {diff_diff_att:.4f}" if diff_diff_att else "")
    print(f"True ATT: {true_overall_att:.4f}")

    results['diff_diff'] = {
        'package': 'diff_diff',
        'method': 'Callaway & Sant\'Anna',
        'time': diff_diff_time,
        'att': diff_diff_att,
        'output': cs_results
    }

except ImportError as e:
    print("Package diff_diff not installed.")
    print("Install with: pip install diff-diff")
    results['diff_diff'] = {
        'package': 'diff_diff',
        'method': 'Callaway & Sant\'Anna',
        'time': None,
        'error': str(e)
    }
except Exception as e:
    print(f"Error running diff_diff: {e}")
    results['diff_diff'] = {
        'package': 'diff_diff',
        'method': 'Callaway & Sant\'Anna',
        'time': None,
        'error': str(e)
    }

Running diff_diff CallawaySantAnna()...

Execution time: 0.05 seconds

Estimated ATT (post-treatment avg): 1.7041
True ATT: 1.5861

5.4 3. Sun & Abraham via `pyfixest`

The pyfixest package is a Python port of the R fixest package and includes support for Sun & Abraham estimation.

try:
    import pyfixest as pf

    print("Running pyfixest for Sun & Abraham...")
    print()

    # Prepare data for pyfixest
    df_pf = df.copy()
    df_pf['cohort'] = df_pf['first_treat'].replace(0, np.inf)  # Never-treated = Inf

    start_time = time.perf_counter()

    # Sun & Abraham estimation
    # pyfixest uses i() for interaction-weighted estimator
    try:
        # Try sunab-style estimation
        out_pf = pf.feols(
            "y ~ sunab(cohort, year) | id + year",
            data=df_pf,
            vcov={'CRV1': 'id'}
        )
        pf_time = time.perf_counter() - start_time

        print(f"Execution time: {pf_time:.2f} seconds")
        print(out_pf.summary())

        # Extract ATT
        pf_att = out_pf.coef().mean()  # Average of event study coefficients

        results['pyfixest'] = {
            'package': 'pyfixest',
            'method': 'Sun & Abraham',
            'time': pf_time,
            'att': pf_att,
            'output': out_pf
        }

    except Exception as e:
        # Fallback to simple TWFE
        print(f"sunab not available in pyfixest, running TWFE: {e}")

        out_pf = pf.feols(
            "y ~ treated | id + year",
            data=df_pf,
            vcov={'CRV1': 'id'}
        )
        pf_time = time.perf_counter() - start_time

        print(f"\nExecution time (TWFE): {pf_time:.2f} seconds")
        print(out_pf.summary())

        results['pyfixest'] = {
            'package': 'pyfixest',
            'method': 'TWFE (sunab not available)',
            'time': pf_time,
            'att': out_pf.coef()['treated'],
            'output': out_pf
        }

except ImportError as e:
    print("Package pyfixest not installed.")
    print("Install with: pip install pyfixest")
    results['pyfixest'] = {
        'package': 'pyfixest',
        'method': 'Sun & Abraham',
        'time': None,
        'error': str(e)
    }
except Exception as e:
    print(f"Error running pyfixest: {e}")
    results['pyfixest'] = {
        'package': 'pyfixest',
        'method': 'Sun & Abraham',
        'time': None,
        'error': str(e)
    }

Running pyfixest for Sun & Abraham...

sunab not available in pyfixest, running TWFE: Unable to evaluate factor `sunab(cohort, year)`. [NameError: name 'sunab' is not defined]


Execution time (TWFE): 5.17 seconds
###

Estimation:  OLS
Dep. var.: y, Fixed effects: id+year
Inference:  CRV1
Observations:  100000

| Coefficient   |   Estimate |   Std. Error |   t value |   Pr(>|t|) |   2.5% |   97.5% |
|:--------------|-----------:|-------------:|----------:|-----------:|-------:|--------:|
| treated       |      1.324 |        0.012 |   114.788 |      0.000 |  1.302 |   1.347 |
---
RMSE: 0.958 R2: 0.668 R2 Within: 0.13 
None

5.5 4. de Chaisemartin & D’Haultfoeuille (`did_multiplegt_dyn`)

The did_multiplegt_dyn package is a Python port of the original Stata/R command by de Chaisemartin & D’Haultfoeuille.

Method: Compares switchers to non-switchers at each period, robust to heterogeneous treatment effects.

import polars as pl
from did_multiplegt_dyn import DidMultiplegtDyn

print("Running DidMultiplegtDyn()...")
print("Note: This estimator can be computationally intensive.\n")

# Convert pandas DataFrame to polars and prepare data
df_dcdh = pl.from_pandas(df)
df_dcdh = df_dcdh.with_columns([
    pl.col('treated').cast(pl.Int32).alias('D')
])
start_time = time.perf_counter()

try:
    # Create model instance
    model_dcdh = DidMultiplegtDyn(
        df=df_dcdh,
        outcome='y',
        group='id',
        time='year',
        treatment='D',
        effects=5,
        placebo=3,
        cluster='id'
    )

    # Fit the model
    model_dcdh.fit()

    dcdh_time = time.perf_counter() - start_time

    # Get summary table
    dcdh_summary = model_dcdh.summary()

    print(f"\nExecution time (10% sample): {dcdh_time:.2f} seconds")
    print(f"Estimated full data time: ~{dcdh_time * 10:.0f} seconds")
    print()
    print(dcdh_summary)

    # Extract Average_Total_Effect from summary table
    dcdh_att = model_dcdh.result['did_multiplegt_dyn']['ATE']['Estimate'].values[0]

    results['did_multiplegt_dyn'] = {
        'package': 'did_multiplegt_dyn',
        'method': 'de Chaisemartin & D\'Haultfoeuille (10% sample)',
        'time': dcdh_time,
        'estimated_full_time': dcdh_time * 10,
        'att': dcdh_att,
        'output': dcdh_summary
    }

except Exception as e:
    print(f"Error during estimation: {e}")
    import traceback
    traceback.print_exc()
    results['did_multiplegt_dyn'] = {
        'package': 'did_multiplegt_dyn',
        'method': 'de Chaisemartin & D\'Haultfoeuille',
        'time': None,
        'att': None,
        'error': str(e)
    }

Running DidMultiplegtDyn()...
Note: This estimator can be computationally intensive.

               Block  Estimate       SE     LB CI    UB CI       N  Switchers     N.w  Switchers.w
            Effect_1  1.242962 0.019474  1.204793 1.281131 28044.0     7009.0 28044.0       7009.0
            Effect_2  1.361619 0.019661  1.323084 1.400153 28044.0     7009.0 28044.0       7009.0
            Effect_3  1.524482 0.022447  1.480487 1.568477 19045.0     5992.0 19045.0       5992.0
            Effect_4  1.574469 0.022517  1.530335 1.618602 19045.0     5992.0 19045.0       5992.0
            Effect_5  1.781573 0.028124  1.726450 1.836696 10945.0     3946.0 10945.0       3946.0
Average_Total_Effect  1.464355 0.016578  1.431863 1.496847 72018.0    29948.0 72018.0      29948.0
           Placebo_1 -0.003321 0.019518 -0.041577 0.034934 28044.0     7009.0 28044.0       7009.0
           Placebo_2 -0.011834 0.023465 -0.057825 0.034156 18044.0     4991.0 18044.0       4991.0
           Placebo_3 -0.018693 0.028306 -0.074171 0.036786 10973.0     3974.0 10973.0       3974.0

Execution time (10% sample): 0.65 seconds
Estimated full data time: ~6 seconds

                  Block  Estimate        SE     LB CI     UB CI        N  \
0              Effect_1  1.242962  0.019474  1.204793  1.281131  28044.0   
1              Effect_2  1.361619  0.019661  1.323084  1.400153  28044.0   
2              Effect_3  1.524482  0.022447  1.480487  1.568477  19045.0   
3              Effect_4  1.574469  0.022517  1.530335  1.618602  19045.0   
4              Effect_5  1.781573  0.028124  1.726450  1.836696  10945.0   
5  Average_Total_Effect  1.464355  0.016578  1.431863  1.496847  72018.0   
6             Placebo_1 -0.003321  0.019518 -0.041577  0.034934  28044.0   
7             Placebo_2 -0.011834  0.023465 -0.057825  0.034156  18044.0   
8             Placebo_3 -0.018693  0.028306 -0.074171  0.036786  10973.0   

   Switchers      N.w  Switchers.w  
0     7009.0  28044.0       7009.0  
1     7009.0  28044.0       7009.0  
2     5992.0  19045.0       5992.0  
3     5992.0  19045.0       5992.0  
4     3946.0  10945.0       3946.0  
5    29948.0  72018.0      29948.0  
6     7009.0  28044.0       7009.0  
7     4991.0  18044.0       4991.0  
8     3974.0  10973.0       3974.0

5.6 5. Borusyak, Jaravel & Spiess (Imputation)

Not Available in Python

There is no native Python package implementing the imputation estimator from Borusyak, Jaravel & Spiess. This method is only available in:

R: didimputation package
Stata: did_imputation command

print("=" * 60)
print("Borusyak, Jaravel & Spiess (did_imputation)")
print("=" * 60)
print()
print("STATUS: NOT AVAILABLE IN PYTHON")
print()
print("This estimator is only available in:")
print("  - R: didimputation package")
print("  - Stata: did_imputation command")
print()

results['didimputation'] = {
    'package': 'N/A',
    'method': 'Borusyak, Jaravel & Spiess',
    'time': None,
    'att': None,
    'error': 'No Python implementation available'
}

============================================================
Borusyak, Jaravel & Spiess (did_imputation)
============================================================

STATUS: NOT AVAILABLE IN PYTHON

This estimator is only available in:
  - R: didimputation package
  - Stata: did_imputation command

5.7 6. Traditional TWFE (Biased Baseline)

For comparison, let’s run traditional TWFE using linearmodels or statsmodels:

try:
    from linearmodels.panel import PanelOLS

    print("Running traditional TWFE with linearmodels...")
    print()

    # Prepare panel data structure
    df_panel = df.set_index(['id', 'year'])

    start_time = time.perf_counter()

    model = PanelOLS(
        df_panel['y'],
        df_panel[['treated']].astype(float),
        entity_effects=True,
        time_effects=True
    )
    out_twfe = model.fit(cov_type='clustered', cluster_entity=True)

    twfe_time = time.perf_counter() - start_time

    print(f"Execution time: {twfe_time:.2f} seconds")
    print(out_twfe.summary.tables[1])

    twfe_att = out_twfe.params['treated']
    print(f"\nTWFE ATT estimate: {twfe_att:.4f}")
    print(f"True ATT: {true_overall_att:.4f}")
    print(f"Bias: {twfe_att - true_overall_att:.4f}")

    results['twfe_linearmodels'] = {
        'package': 'linearmodels',
        'method': 'Traditional TWFE (biased)',
        'time': twfe_time,
        'att': twfe_att,
        'output': out_twfe
    }

except ImportError:
    print("Package linearmodels not installed.")
    print("Install with: pip install linearmodels")

    # Fallback to statsmodels
    try:
        import statsmodels.api as sm
        from statsmodels.regression.linear_model import OLS

        print("\nRunning TWFE with statsmodels (demeaned)...")

        # Demean for fixed effects
        df_fe = df.copy()
        df_fe['y_demeaned'] = df_fe.groupby('id')['y'].transform(lambda x: x - x.mean())
        df_fe['y_demeaned'] = df_fe.groupby('year')['y_demeaned'].transform(lambda x: x - x.mean())
        df_fe['treated_demeaned'] = df_fe.groupby('id')['treated'].transform(lambda x: x - x.mean())
        df_fe['treated_demeaned'] = df_fe.groupby('year')['treated_demeaned'].transform(lambda x: x - x.mean())

        start_time = time.perf_counter()
        model = OLS(df_fe['y_demeaned'], df_fe['treated_demeaned']).fit()
        twfe_time = time.perf_counter() - start_time

        print(f"Execution time: {twfe_time:.2f} seconds")
        print(f"TWFE ATT: {model.params[0]:.4f}")

        results['twfe_statsmodels'] = {
            'package': 'statsmodels',
            'method': 'Traditional TWFE (biased)',
            'time': twfe_time,
            'att': model.params[0]
        }

    except Exception as e:
        print(f"Error: {e}")

except Exception as e:
    print(f"Error running TWFE: {e}")

Running traditional TWFE with linearmodels...

Execution time: 0.09 seconds
                             Parameter Estimates                              
==============================================================================
            Parameter  Std. Err.     T-stat    P-value    Lower CI    Upper CI
------------------------------------------------------------------------------
treated        1.3242     0.0122     108.90     0.0000      1.3003      1.3480
==============================================================================

TWFE ATT estimate: 1.3242
True ATT: 1.5861
Bias: -0.2620

5.8 Python Results Summary

import pandas as pd

# Create summary table
summary_data = []
for name, r in results.items():
    summary_data.append({
        'Package': r.get('package', 'N/A'),
        'Method': r.get('method', 'N/A'),
        'Time (s)': r.get('time'),
        'ATT': r.get('att'),
        'True ATT': true_overall_att if r.get('time') else None,
        'Bias': (r.get('att') - true_overall_att) if r.get('att') else None,
        'Status': 'Error: ' + r.get('error', '') if r.get('error') else 'OK'
    })

summary_df = pd.DataFrame(summary_data)
print("\n" + "=" * 80)
print("PYTHON PACKAGE COMPARISON SUMMARY")
print("=" * 80)
print(summary_df.to_string(index=False))

# Save for comparison chapter
summary_df.to_csv("python_results.csv", index=False)


================================================================================
PYTHON PACKAGE COMPARISON SUMMARY
================================================================================
           Package                                         Method  Time (s)      ATT  True ATT      Bias                                    Status
             csdid                           Callaway & Sant'Anna  0.607066 1.704085  1.586146  0.117939                                        OK
         diff_diff                           Callaway & Sant'Anna  0.049023 1.704085  1.586146  0.117939                                        OK
          pyfixest                     TWFE (sunab not available)  5.165600 1.324159  1.586146 -0.261987                                        OK
did_multiplegt_dyn de Chaisemartin & D'Haultfoeuille (10% sample)  0.645394 1.464355  1.586146 -0.121791                                        OK
               N/A                     Borusyak, Jaravel & Spiess       NaN      NaN       NaN       NaN Error: No Python implementation available
      linearmodels                      Traditional TWFE (biased)  0.087213 1.324159  1.586146 -0.261987                                        OK

import matplotlib.pyplot as plt

# Filter to packages with valid times
timing_data = [(r['package'], r['time']) for r in results.values()
               if r.get('time') is not None]

if timing_data:
    packages, times = zip(*timing_data)

    fig, ax = plt.subplots(figsize=(10, 6))
    bars = ax.barh(packages, times, color='steelblue')
    ax.set_xlabel('Execution Time (seconds)')
    ax.set_title(f'Python Package Execution Times\nDataset: {len(df):,} observations')

    # Add time labels
    for bar, t in zip(bars, times):
        ax.text(bar.get_width() + 1, bar.get_y() + bar.get_height()/2,
                f'{t:.1f}s', va='center')

    plt.tight_layout()
    plt.show()
else:
    print("No timing data available to plot.")

Execution time comparison (Python packages)

5.9 Python Package Availability Notes

5.9.1 Available Packages

csdid: Full implementation of Callaway & Sant’Anna
- Install: pip install csdid
- Supports doubly-robust estimation
- Can be slow with very large datasets
diff_diff: Alternative CS implementation
- Install: pip install diff-diff
- Generally faster than csdid
- Good for event study aggregation
pyfixest: Port of R’s fixest
- Install: pip install pyfixest
- Fast fixed effects estimation
- Sun & Abraham support varies by version
did_multiplegt_dyn: de Chaisemartin & D’Haultfoeuille
- Install: pip install did-multiplegt-dyn
- Python port of the R/Stata command
- Computationally intensive for large datasets

5.9.2 Not Available in Python

Missing Python Implementation

The following estimator does not have a native Python package:

Borusyak, Jaravel & Spiess (didimputation) - Only available in R and Stata - No community Python port exists

For this method, users must either:

Use R (recommended via Quarto or rpy2)
Use Stata
Implement the method from scratch

5 Python Packages for Staggered DiD

5.1 Package Availability Summary

5.2 1. Callaway & Sant’Anna (csdid)

5.3 2. Alternative CS Implementation (diff_diff)

5.4 3. Sun & Abraham via pyfixest

5.5 4. de Chaisemartin & D’Haultfoeuille (did_multiplegt_dyn)

5.6 5. Borusyak, Jaravel & Spiess (Imputation)

5.7 6. Traditional TWFE (Biased Baseline)

5.8 Python Results Summary

5.9 Python Package Availability Notes

5.9.1 Available Packages

5.9.2 Not Available in Python

5.2 1. Callaway & Sant’Anna (`csdid`)

5.3 2. Alternative CS Implementation (`diff_diff`)

5.4 3. Sun & Abraham via `pyfixest`

5.5 4. de Chaisemartin & D’Haultfoeuille (`did_multiplegt_dyn`)