diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt index d5863bc0..281c2ef3 100644 --- a/diff_diff/guides/llms-full.txt +++ b/diff_diff/guides/llms-full.txt @@ -2075,7 +2075,7 @@ clear_cache() ## Survey Support -All estimators accept an optional `survey_design` parameter in `fit()`. Pass a `SurveyDesign` object to get design-based variance estimation. +Most estimators accept an optional `survey_design` parameter in `fit()` (`SyntheticControl` rejects it as not yet supported); depth of support varies by estimator - see the compatibility matrix in `docs/choosing_estimator.rst` (Survey Design Support). Pass a `SurveyDesign` object to get design-based variance estimation. ```python from diff_diff import SurveyDesign, CallawaySantAnna @@ -2120,7 +2120,7 @@ sd_female, data_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F' **Key features:** - Taylor Series Linearization (TSL) variance with strata + PSU + FPC -- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (13 of 16 estimators, including dCDH) +- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (13 of 20 estimators, including dCDH) - Survey-aware bootstrap: multiplier at PSU (Hall-Mammen wild; dCDH, staggered) or Rao-Wu rescaled (SunAbraham, SyntheticDiD, TROP). SyntheticDiD bootstrap composes Rao-Wu rescaled per-draw weights with the weighted Frank-Wolfe variant of `_sc_weight_fw` (PR #355): each draw solves `min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²` and composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Pweight-only fits use constant `rw = w_control`; full designs use Rao-Wu. SDID's placebo (stratified permutation + weighted FW) and jackknife (PSU-level LOO with stratum aggregation, Rust & Rao 1996) paths also support pweight-only and full strata/PSU/FPC designs - DEFF diagnostics, subpopulation analysis, weight trimming (`trim_weights`) - Repeated cross-sections: `CallawaySantAnna(panel=False)` diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt index f61f5f3d..5268ce18 100644 --- a/diff_diff/guides/llms.txt +++ b/diff_diff/guides/llms.txt @@ -2,7 +2,7 @@ > A Python library for Difference-in-Differences (DiD) causal inference analysis. Provides sklearn-like estimators with statsmodels-style summary output for econometric analysis. -diff-diff offers 19 estimators covering basic 2x2 DiD, modern staggered adoption methods, reversible (non-absorbing) treatments, advanced panel estimators, nonlinear models, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), complex survey designs (strata/PSU/FPC, replicate weights, design-based variance), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP. +diff-diff offers 20 estimators covering basic 2x2 DiD, modern staggered adoption methods, reversible (non-absorbing) treatments, advanced panel estimators, nonlinear models, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), complex survey designs (strata/PSU/FPC, replicate weights, design-based variance), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP. - Install: `pip install diff-diff` - License: MIT @@ -104,7 +104,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")` ## Survey Support -All estimators accept an optional `survey_design` parameter. Pass a `SurveyDesign` object to get design-based variance estimation: +Most estimators accept an optional `survey_design` parameter (`SyntheticControl` does not yet support it); coverage and weight types vary by estimator - see the [Survey Design Support matrix](https://diff-diff.readthedocs.io/en/stable/choosing_estimator.html#survey-design-support). Pass a `SurveyDesign` object to get design-based variance estimation: - **Design elements**: strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling, nest - **Variance methods**: Taylor Series Linearization (TSL), replicate weights (BRR/Fay/JK1/JKn/SDR), survey-aware bootstrap diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index c3edcc31..b9abcd04 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -771,9 +771,10 @@ If you're unsure which estimator to use: Survey Design Support --------------------- -All estimators accept an optional ``survey_design`` parameter in ``fit()``. +Most estimators support an optional ``survey_design`` parameter in ``fit()`` +(``SyntheticControl`` accepts the parameter but raises ``NotImplementedError``). Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance -estimation. The depth of support varies by estimator: +estimation. The depth of support varies by estimator and variance method: .. note:: @@ -820,7 +821,7 @@ estimation. The depth of support varies by estimator: * - ``ChaisemartinDHaultfoeuille`` - pweight only - Full (TSL) - - -- + - Full (analytical) - Group-level (warning) * - ``TripleDifference`` - pweight only @@ -869,9 +870,14 @@ estimation. The depth of support varies by estimator: - Multiplier at PSU * - ``SyntheticDiD`` - pweight only - - Via bootstrap + - Full (method-specific) - -- - Hybrid pairs-bootstrap + Rao-Wu rescaled (bootstrap only) + * - ``SyntheticControl`` + - -- + - -- + - -- + - -- * - ``TROP`` - pweight only - Via bootstrap @@ -887,6 +893,11 @@ estimation. The depth of support varies by estimator: - Full (Binder TSL) - -- - -- + * - ``SpilloverDiD`` + - pweight only + - Full (Binder TSL + Conley) + - -- + - -- * - ``BaconDecomposition`` - Diagnostic - Diagnostic @@ -897,24 +908,26 @@ estimation. The depth of support varies by estimator: - **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance - **Full (pweight only)**: Full TSL with strata/PSU/FPC, but only ``pweight`` accepted (``fweight``/``aweight`` rejected because composition changes weight semantics) -- **Via bootstrap**: Strata/PSU/FPC supported only with bootstrap variance. ``TROP`` uses bootstrap by default. ``SyntheticDiD`` supports strata/PSU/FPC on ``variance_method='bootstrap'`` via a hybrid pairs-bootstrap + Rao-Wu rescaling composition (see the ``Note (survey + bootstrap composition)`` in REGISTRY.md §SyntheticDiD); ``placebo`` and ``jackknife`` remain pweight-only. +- **Via bootstrap**: Strata/PSU/FPC supported only with bootstrap variance (``TROP``, which uses bootstrap by default) +- **Full (method-specific)**: ``SyntheticDiD`` supports strata/PSU/FPC on all three variance methods via method-specific survey paths — see the note below and the ``Note (survey support matrix)`` in REGISTRY.md §SyntheticDiD - **pweight only** (Weights column): Only ``pweight`` accepted; ``fweight``/``aweight`` raise an error - **Diagnostic**: Weighted descriptive statistics only (no inference) - **--**: Not supported .. note:: - ``SyntheticDiD`` supports survey designs on ``variance_method='bootstrap'`` - — both pweight-only and full strata/PSU/FPC — via a hybrid pairs-bootstrap - composed with per-draw Rao-Wu rescaled weights fed into a weighted - Frank-Wolfe re-estimation of ω and λ. See the - ``Note (survey + bootstrap composition)`` in REGISTRY.md §SyntheticDiD - for the objective form and argmin-set caveat. - - ``variance_method='placebo'`` and ``variance_method='jackknife'`` remain - pweight-only — composing placebo permutations / leave-one-out with - Rao-Wu rescaling under the weighted objective is a separate derivation - (tracked in ``TODO.md``). + ``SyntheticDiD`` supports survey designs — both pweight-only and full + strata/PSU/FPC — on all three variance methods, each via a + method-specific path: ``bootstrap`` composes a hybrid pairs-bootstrap + with per-draw Rao-Wu rescaled weights fed into a weighted Frank-Wolfe + re-estimation of ω and λ; ``placebo`` switches to stratified + permutation (pseudo-treated draws within strata containing treated + units) with weighted-FW re-estimation, and FPC is a documented no-op + for the permutation test; ``jackknife`` switches to PSU-level + leave-one-out with stratum aggregation (Rust & Rao 1996). + Replicate-weight designs are rejected. See the + ``Note (survey support matrix)`` and the per-method composition notes + in REGISTRY.md §SyntheticDiD. For the full walkthrough with code examples, see the `survey tutorial `_. diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 33ab7fd3..07ad7e07 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -4576,7 +4576,7 @@ variance from the distribution of replicate estimates. design structure is fixed and dropped replicates contribute zero to the sum without changing the scale. Survey df uses `n_valid - 1` for t-based inference. -- **Note:** Replicate-weight support matrix (12 of 15 public estimators): +- **Note:** Replicate-weight support matrix (13 of 20 public estimators): - **Supported**: CallawaySantAnna (reg/ipw/dr with or without covariates, no bootstrap; IF-based replicate variance is covariate-agnostic), ContinuousDiD (no bootstrap), EfficientDiD (no bootstrap), @@ -4587,9 +4587,17 @@ variance from the distribution of replicate estimates. TwoWayFixedEffects (estimator-level refit with within-transformation), SunAbraham (estimator-level refit, replaces `vcov_cohort`), StackedDiD (estimator-level refit with Q-weight composition), - ImputationDiD (two-stage refit), TwoStageDiD (two-stage refit) + ImputationDiD (two-stage refit), TwoStageDiD (two-stage refit), + ChaisemartinDHaultfoeuille (closed-form cell-collapse replicate ATT, + multi-horizon and placebo paths; replicate + `n_bootstrap > 0` rejected + — see the ChaisemartinDHaultfoeuille Notes for the allocator contract) - **Rejected with NotImplementedError**: SyntheticDiD, TROP - (bootstrap-based variance), BaconDecomposition (diagnostic only) + (bootstrap-based variance), WooldridgeDiD, LPDiD, SpilloverDiD, + HeterogeneousAdoptionDiD (TSL-only survey paths; replicate designs + rejected at `fit()`), SyntheticControl (rejects `survey_design` + entirely) + - **BaconDecomposition** is diagnostic-only — outside the 20-estimator + count — and likewise rejects replicate designs - Estimators with replicate support reject replicate + bootstrap (replicate weights provide analytical variance) - **Note:** When invalid replicates are dropped in `compute_replicate_vcov` diff --git a/docs/practitioner_decision_tree.rst b/docs/practitioner_decision_tree.rst index 1dd6e5b5..8aeed5cc 100644 --- a/docs/practitioner_decision_tree.rst +++ b/docs/practitioner_decision_tree.rst @@ -463,9 +463,10 @@ At a Glance What About the Other Estimators? -------------------------------- -diff-diff has 17 estimators covering advanced scenarios: Sun-Abraham for +diff-diff has 20 estimators covering advanced scenarios: Sun-Abraham for interaction-weighted estimation, Imputation DiD and Two-Stage DiD for alternative -staggered approaches, Stacked DiD, Efficient DiD, Triple Difference, TROP, and more. +staggered approaches, Local Projections DiD, Stacked DiD, Efficient DiD, +Triple Difference, TROP, and more. The six scenarios above cover the most common business use cases. For the full academic decision tree with all estimators, see :doc:`choosing_estimator`. diff --git a/paper.bib b/paper.bib index 6aa53f0d..0e8f3498 100644 --- a/paper.bib +++ b/paper.bib @@ -220,7 +220,7 @@ @misc{Gerber2026 @article{Abadie2010, author = {Abadie, Alberto and Diamond, Alexis and Hainmueller, Jens}, - title = {Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program}, + title = {Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of {California's} Tobacco Control Program}, journal = {Journal of the American Statistical Association}, volume = {105}, number = {490}, @@ -249,3 +249,32 @@ @misc{deChaisemartin2026 primaryclass = {econ.EM}, doi = {10.48550/arXiv.2405.04465} } + +@article{Dube2025, + author = {Dube, Arindrajit and Girardi, Daniele and Jord{\`a}, {\`O}scar and Taylor, Alan M.}, + title = {A Local Projections Approach to Difference-in-Differences}, + journal = {Journal of Applied Econometrics}, + volume = {40}, + number = {5}, + pages = {741--758}, + year = {2025}, + doi = {10.1002/jae.70000} +} + +@article{Binder1983, + author = {Binder, David A.}, + title = {On the Variances of Asymptotically Normal Estimators from Complex Surveys}, + journal = {International Statistical Review}, + volume = {51}, + number = {3}, + pages = {279--292}, + year = {1983}, + doi = {10.2307/1402588} +} + +@misc{pyfixest, + author = {{The PyFixest Authors}}, + title = {{pyfixest}: Fast High-Dimensional Fixed Effect Estimation in {Python}}, + year = {2025}, + url = {https://github.com/py-econometrics/pyfixest} +} diff --git a/paper.md b/paper.md index a20d75de..95ac021a 100644 --- a/paper.md +++ b/paper.md @@ -21,7 +21,7 @@ bibliography: paper.bib # Summary `diff-diff` is a Python library for Difference-in-Differences (DiD) causal inference -analysis. It provides 19 estimators covering the full modern DiD toolkit - from classic +analysis. It provides 20 estimators covering the full modern DiD toolkit - from classic two-group/two-period designs through heterogeneity-robust staggered adoption methods, synthetic control hybrids, and sensitivity analysis - under a consistent scikit-learn-style API. Most estimators accept an optional `SurveyDesign` object for design-based variance @@ -41,15 +41,13 @@ modern methods - including Callaway and Sant'Anna [-@Callaway2021], Sun and Abra [-@Sun2021], Borusyak, Jaravel, and Spiess [-@Borusyak2024], and others - are now standard practice in applied work. -The R ecosystem provides mature implementations across several packages: `did` -[@Callaway2021], `fixest` [@Berge2018], `synthdid` [@Arkhangelsky2021], and `HonestDiD` -[@Rambachan2023]. Stata offers `csdid` and `didregress`. Python, however, lacks a unified -DiD library. Practitioners working in Python-based data science workflows - increasingly -common in industry settings for marketing measurement, product experimentation, and policy -evaluation - must either context-switch to R, reimplement methods from scratch, or rely on -partial implementations scattered across unrelated packages. +These methods are well served in R and Stata, but Python lacks a unified DiD library. +Practitioners working in Python-based data science workflows - increasingly common in +industry settings for marketing measurement, product experimentation, and policy +evaluation - must either context-switch to another language, reimplement methods from +scratch, or rely on partial implementations scattered across unrelated packages. -`diff-diff` fills this gap by providing a single-import library that covers 19 estimators +`diff-diff` fills this gap by providing a single-import library that covers 20 estimators with a consistent API, survey-weighted inference, and numerical validation against R. It is also the companion software for the design-based variance framework of @Gerber2026, which establishes design-consistent standard errors for modern DiD estimators under @@ -57,43 +55,56 @@ complex survey designs. It targets both applied researchers who need rigorous ec methods and data science practitioners who need accessible causal inference tools integrated into Python workflows. +# State of the Field + +The R ecosystem provides mature implementations across several packages: `did` +[@Callaway2021], `fixest` [@Berge2018], `synthdid` [@Arkhangelsky2021], and `HonestDiD` +[@Rambachan2023]; Stata offers `csdid` and `didregress`. Python coverage is partial and +fragmented. `pyfixest` [@pyfixest] brings `fixest`-style high-dimensional fixed-effects +regression to Python, including Sun-Abraham, two-stage, and local-projections estimators, +but is organized around its regression engine rather than the wider DiD taxonomy; +`differences` implements Callaway-Sant'Anna group-time estimation; `CausalPy` offers +Bayesian analysis of quasi-experiments, including synthetic control, without +staggered-adoption support. General-purpose causal inference toolkits such as `DoWhy` and +`EconML` target other identification strategies. + +`diff-diff` was built as a new library, rather than as contributions to these packages, +because its central contribution is cross-cutting: one estimator contract, one shared +inference core, and an influence-function architecture that composes design-based survey +variance across the estimator taxonomy, with per-estimator support documented in a +compatibility matrix and unsupported combinations failing closed. To our knowledge, no existing DiD software +in any language provides design-based variance estimation for complex survey data, and no +Python package covers the modern estimator taxonomy end-to-end; `diff-diff` provides +both, validated against the R reference implementations where they exist. + # Key Features -**Breadth of methods.** `diff-diff` implements 19 estimators organized across the modern +**Breadth of methods.** `diff-diff` implements 20 estimators organized across the modern DiD taxonomy. Classic designs include two-group/two-period DiD, two-way fixed effects, and event-study estimation with period-specific effects. Heterogeneity-robust staggered-adoption estimators include Callaway-Sant'Anna [@Callaway2021], Sun-Abraham [@Sun2021], imputation -[@Borusyak2024], two-stage [@Gardner2022], stacked [@Wing2024], and efficient [@Chen2025] -approaches, together with reversible-treatment DiD for non-absorbing interventions -[@deChaisemartin2020] and a ring-indicator estimator for spatial spillovers [@Butts2021]. -Synthetic-control hybrids include synthetic DiD [@Arkhangelsky2021] and the classic -synthetic control method [@Abadie2010]. Extended designs include triple-difference and -staggered triple-difference estimators [@OrtizVillavicencio2025], continuous-treatment DiD -with dose-response curves [@Callaway2024], heterogeneous-adoption designs where no unit -remains untreated [@deChaisemartin2026], nonlinear ETWFE [@Wooldridge2025; @Wooldridge2023], -and triply robust panel estimation [@Athey2025]. Separate diagnostic and sensitivity tools - -outside the 19 estimators - include Goodman-Bacon decomposition [@GoodmanBacon2021], Honest -DiD sensitivity analysis [@Rambachan2023], placebo tests, and pre-trends power analysis -[@Roth2022]. All estimators share a consistent `fit()` interface with -`get_params()`/`set_params()` for configuration, R-style formula support, and rich results -objects with `summary()` output. An optional Rust backend via PyO3 accelerates -compute-intensive operations. +[@Borusyak2024], two-stage [@Gardner2022], stacked [@Wing2024], efficient [@Chen2025], and +local-projections [@Dube2025] approaches, together with reversible-treatment DiD for +non-absorbing interventions [@deChaisemartin2020] and a ring-indicator estimator for +spatial spillovers [@Butts2021]. Synthetic-control hybrids include synthetic DiD +[@Arkhangelsky2021] and the classic synthetic control method [@Abadie2010]. Extended +designs include triple-difference and staggered triple-difference estimators +[@OrtizVillavicencio2025], continuous-treatment DiD with dose-response curves +[@Callaway2024], heterogeneous-adoption designs where no unit remains untreated +[@deChaisemartin2026], nonlinear ETWFE [@Wooldridge2025; @Wooldridge2023], and triply +robust panel estimation [@Athey2025]. Separate diagnostic and sensitivity tools - outside +the 20 estimators - include Goodman-Bacon decomposition [@GoodmanBacon2021], Honest DiD +sensitivity analysis [@Rambachan2023], placebo tests, and pre-trends power analysis +[@Roth2022]. **Survey-weighted inference.** A `SurveyDesign` class supports stratification, primary sampling units, finite population corrections, and probability weights. Variance estimation includes Taylor series linearization, five replicate weight methods (BRR, Fay's BRR, JK1, -JKn, SDR), and survey-aware bootstrap. Survey variance is validated against R's `survey` -package [@Lumley2004] on three real complex-survey datasets - NHANES, RECS 2020, and the -California API school dataset - to a tight tolerance (test gaps < 1e-8, typically below -1e-10). The design-based variance result - that the influence functions of modern DiD -estimators satisfy Binder's (1983) smoothness conditions, so stratified-cluster -linearization yields design-consistent standard errors - is derived in @Gerber2026. No -other DiD package in any language provides integrated survey support. - -**Validation against R.** Point estimates match the R `did`, `synthdid`, and `fixest` -packages to machine precision (differences < 1e-10). Standard errors match exactly for -core estimators including Callaway-Sant'Anna and basic DiD. Validation includes the -canonical MPDTA minimum-wage dataset from Callaway and Sant'Anna [-@Callaway2021]. +JKn, SDR), and survey-aware bootstrap. The design-based variance result - that the +influence functions of modern DiD estimators satisfy the smoothness conditions of +@Binder1983, so stratified-cluster linearization yields design-consistent standard +errors - is derived in @Gerber2026. No other DiD package in any language provides +integrated survey support. **Practitioner tooling.** Beyond estimation, `diff-diff` includes a practitioner decision tree for estimator selection, an 8-step diagnostic workflow based on Baker et al. @@ -101,16 +112,69 @@ tree for estimator selection, an 8-step diagnostic workflow based on Baker et al aggregation utilities for converting individual-level survey responses into geographic-period panels suitable for DiD analysis. +# Software Design + +Every estimator implements a common contract: a scikit-learn-style `fit()` with +`get_params()`/`set_params()` for configuration and rich results dataclasses with +`summary()`, `to_dict()`, and `to_dataframe()`; the classic regression estimators +additionally accept R-style formulas. Numerical work is deliberately centralized: +estimators solve their least-squares problems through a single shared linear-algebra +core, and analytical robust, cluster-robust, and survey variances route through one +shared sandwich-estimator path, so numerical hardening - rank-deficiency guards, +degrees-of-freedom corrections, small-cluster behavior - lands in one place. Estimators +whose inference is inherently resampling-based - synthetic DiD's placebo and jackknife +variance, for example - use method-specific variance paths. All estimators share one +joint-inference contract: inference fields (standard error, t-statistic, p-value, +confidence interval) are always computed together and become NaN together when inference +is not identified, rather than silently reporting partial results. + +Two design choices carry the survey capability and the deployment story. First, the +regression- and influence-function-based estimators compute influence functions for their +target parameters, so design-based variance - Taylor series linearization over strata and +clusters, and replicate weights - routes through one shared survey-variance core rather +than requiring per-estimator derivations; resampling-based estimators such as synthetic +DiD and TROP compose survey designs through documented method-specific bootstrap, +placebo, and jackknife paths. Supported design-estimator combinations are listed in a +per-estimator compatibility matrix, and unsupported ones are rejected explicitly rather +than silently approximated. Second, the runtime +dependency footprint is minimal by policy - numpy, pandas, and scipy only - keeping the +library easy to install in restricted industry environments; high-dimensional fixed +effects are handled by within-transformation rather than by delegating to a heavier +econometrics stack. An optional Rust backend (via PyO3) accelerates compute-intensive +kernels such as synthetic-control weight solving and fixed-effects absorption; the Python +implementation remains canonical, equivalence between backends is enforced by the test +suite, and the library falls back to pure Python automatically when the extension is +unavailable. + +# Research Impact Statement + +`diff-diff` is the companion software of the design-based variance preprint [@Gerber2026]: +the framework derived there is implemented here, and the preprint's numerical results are +produced with the library. Correctness evidence ships with the repository as reproducible +material. Golden-file benchmarks pin point estimates against R's `did`, `synthdid`, and +`fixest` to machine precision (differences < 1e-10), including the canonical MPDTA +minimum-wage application of Callaway and Sant'Anna [-@Callaway2021], with standard errors +matching exactly for core estimators such as Callaway-Sant'Anna and basic DiD. Survey +variance is validated against R's `survey` package [@Lumley2004] on three real +complex-survey datasets - NHANES, RECS 2020, and the California API school data - with +test gaps below 1e-8 and typically below 1e-10. The library is distributed on PyPI with +tagged releases, has six months of continuous public development history (3,000+ +commits), and is exercised by a CI test suite of more than 7,600 tests; 26 tutorial +notebooks and full API documentation are published on Read the Docs, and machine-readable +guides bundled in the wheel (`llms.txt`) make the library directly usable by AI-assisted +analysis workflows. + # AI Usage Disclosure Generative AI tools were used in developing this software and manuscript. Anthropic's -Claude models (the Opus and Sonnet families, via the Claude Code CLI) assisted with code -generation and refactoring, test scaffolding, documentation, and drafting and editing of -this manuscript. The author reviewed, modified, and validated all AI-generated code and -text and made all primary architectural and methodological decisions. Numerical results -were independently verified against established R reference packages (`did`, `synthdid`, -`fixest`, `survey`) for every estimator with an R equivalent, and against the author's -reference derivations or simulation otherwise. The author takes full responsibility for the -accuracy and integrity of the software and this paper. +Claude models (the Opus, Sonnet, and Fable model families, via the Claude Code CLI) +assisted with code generation and refactoring, test scaffolding, documentation, and +drafting and editing of this manuscript. The author reviewed, modified, and validated all +AI-generated code and text and made all primary architectural and methodological +decisions. Numerical results were independently verified against established R reference +packages (`did`, `synthdid`, `fixest`, `survey`) for every estimator with an R +equivalent, and against the author's reference derivations or simulation otherwise. The +author takes full responsibility for the accuracy and integrity of the software and this +paper. # References