From 6c841d24ae29ec6b550fd985b2e395d833f57059 Mon Sep 17 00:00:00 2001 From: igerber Date: Fri, 3 Jul 2026 07:57:32 -0400 Subject: [PATCH 1/6] docs(paper): finalize JOSS submission - 20 estimators, required sections, draft-pdf CI - paper.md: update estimator count 19 -> 20 (LPDiD merged after #564); add LPDiD to the staggered-adoption method list with Dube2025 citation - paper.md: restructure to current JOSS required sections - add State of the Field (Python/R/Stata landscape with build-vs-contribute justification), Software Design (shared inference core, influence-function survey architecture, minimal-dependency policy, Rust backend), and Research Impact Statement (companion preprint, golden-file R validation, real-data survey validation, community-readiness signals); 1,269 words, within the 750-1750 target - paper.bib: add Dube2025 (LP-DiD, J. Applied Econometrics), Binder1983 (previously cited in-text without a bib entry), pyfixest (software citation for State of the Field) - paper.md: AI disclosure updated to name the Opus, Sonnet, and Fable model families - llms.txt / llms-full.txt: stale counts 19 -> 20 and "13 of 16" -> "13 of 20" replicate-weight support - practitioner_decision_tree.rst: stale "17 estimators" -> 20, mention Local Projections DiD - choosing_estimator.rst survey matrix: dCDH replicate weights "--" -> "Full (analytical)" (support landed with test_survey_dcdh_replicate_psu coverage); add missing SpilloverDiD and SyntheticControl rows; note the SyntheticControl NotImplementedError in the intro - .github/workflows/draft-pdf.yml: SHA-pinned openjournals draft action compiles the paper on paper.md/paper.bib changes and uploads the PDF artifact Co-Authored-By: Claude Fable 5 --- .github/workflows/draft-pdf.yml | 36 +++++++ diff_diff/guides/llms-full.txt | 2 +- diff_diff/guides/llms.txt | 2 +- docs/choosing_estimator.rst | 15 ++- docs/practitioner_decision_tree.rst | 5 +- paper.bib | 29 ++++++ paper.md | 146 +++++++++++++++++++--------- 7 files changed, 184 insertions(+), 51 deletions(-) create mode 100644 .github/workflows/draft-pdf.yml diff --git a/.github/workflows/draft-pdf.yml b/.github/workflows/draft-pdf.yml new file mode 100644 index 00000000..9911561b --- /dev/null +++ b/.github/workflows/draft-pdf.yml @@ -0,0 +1,36 @@ +name: Draft PDF + +on: + push: + branches: [main] + paths: + - paper.md + - paper.bib + - .github/workflows/draft-pdf.yml + pull_request: + paths: + - paper.md + - paper.bib + - .github/workflows/draft-pdf.yml + workflow_dispatch: + +permissions: + contents: read + +jobs: + paper: + runs-on: ubuntu-latest + name: Compile JOSS paper draft + steps: + - name: Checkout + uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7 + - name: Build draft PDF + uses: openjournals/openjournals-draft-action@85a18372e48f551d8af9ddb7a747de685fbbb01c # v1.0 + with: + journal: joss + paper-path: paper.md + - name: Upload paper artifact + uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7 + with: + name: joss-paper + path: paper.pdf diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt index d5863bc0..58d9f6a9 100644 --- a/diff_diff/guides/llms-full.txt +++ b/diff_diff/guides/llms-full.txt @@ -2120,7 +2120,7 @@ sd_female, data_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F' **Key features:** - Taylor Series Linearization (TSL) variance with strata + PSU + FPC -- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (13 of 16 estimators, including dCDH) +- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (13 of 20 estimators, including dCDH) - Survey-aware bootstrap: multiplier at PSU (Hall-Mammen wild; dCDH, staggered) or Rao-Wu rescaled (SunAbraham, SyntheticDiD, TROP). SyntheticDiD bootstrap composes Rao-Wu rescaled per-draw weights with the weighted Frank-Wolfe variant of `_sc_weight_fw` (PR #355): each draw solves `min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²` and composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Pweight-only fits use constant `rw = w_control`; full designs use Rao-Wu. SDID's placebo (stratified permutation + weighted FW) and jackknife (PSU-level LOO with stratum aggregation, Rust & Rao 1996) paths also support pweight-only and full strata/PSU/FPC designs - DEFF diagnostics, subpopulation analysis, weight trimming (`trim_weights`) - Repeated cross-sections: `CallawaySantAnna(panel=False)` diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt index f61f5f3d..d584eeba 100644 --- a/diff_diff/guides/llms.txt +++ b/diff_diff/guides/llms.txt @@ -2,7 +2,7 @@ > A Python library for Difference-in-Differences (DiD) causal inference analysis. Provides sklearn-like estimators with statsmodels-style summary output for econometric analysis. -diff-diff offers 19 estimators covering basic 2x2 DiD, modern staggered adoption methods, reversible (non-absorbing) treatments, advanced panel estimators, nonlinear models, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), complex survey designs (strata/PSU/FPC, replicate weights, design-based variance), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP. +diff-diff offers 20 estimators covering basic 2x2 DiD, modern staggered adoption methods, reversible (non-absorbing) treatments, advanced panel estimators, nonlinear models, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), complex survey designs (strata/PSU/FPC, replicate weights, design-based variance), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP. - Install: `pip install diff-diff` - License: MIT diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index c3edcc31..2f7bdeef 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -771,7 +771,8 @@ If you're unsure which estimator to use: Survey Design Support --------------------- -All estimators accept an optional ``survey_design`` parameter in ``fit()``. +All estimators accept an optional ``survey_design`` parameter in ``fit()`` +(``SyntheticControl`` does not yet support it and raises ``NotImplementedError``). Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance estimation. The depth of support varies by estimator: @@ -820,7 +821,7 @@ estimation. The depth of support varies by estimator: * - ``ChaisemartinDHaultfoeuille`` - pweight only - Full (TSL) - - -- + - Full (analytical) - Group-level (warning) * - ``TripleDifference`` - pweight only @@ -872,6 +873,11 @@ estimation. The depth of support varies by estimator: - Via bootstrap - -- - Hybrid pairs-bootstrap + Rao-Wu rescaled (bootstrap only) + * - ``SyntheticControl`` + - -- + - -- + - -- + - -- * - ``TROP`` - pweight only - Via bootstrap @@ -887,6 +893,11 @@ estimation. The depth of support varies by estimator: - Full (Binder TSL) - -- - -- + * - ``SpilloverDiD`` + - pweight only + - Full (Binder TSL + Conley) + - -- + - -- * - ``BaconDecomposition`` - Diagnostic - Diagnostic diff --git a/docs/practitioner_decision_tree.rst b/docs/practitioner_decision_tree.rst index 1dd6e5b5..8aeed5cc 100644 --- a/docs/practitioner_decision_tree.rst +++ b/docs/practitioner_decision_tree.rst @@ -463,9 +463,10 @@ At a Glance What About the Other Estimators? -------------------------------- -diff-diff has 17 estimators covering advanced scenarios: Sun-Abraham for +diff-diff has 20 estimators covering advanced scenarios: Sun-Abraham for interaction-weighted estimation, Imputation DiD and Two-Stage DiD for alternative -staggered approaches, Stacked DiD, Efficient DiD, Triple Difference, TROP, and more. +staggered approaches, Local Projections DiD, Stacked DiD, Efficient DiD, +Triple Difference, TROP, and more. The six scenarios above cover the most common business use cases. For the full academic decision tree with all estimators, see :doc:`choosing_estimator`. diff --git a/paper.bib b/paper.bib index 6aa53f0d..3c7405c1 100644 --- a/paper.bib +++ b/paper.bib @@ -249,3 +249,32 @@ @misc{deChaisemartin2026 primaryclass = {econ.EM}, doi = {10.48550/arXiv.2405.04465} } + +@article{Dube2025, + author = {Dube, Arindrajit and Girardi, Daniele and Jord{\`a}, {\`O}scar and Taylor, Alan M.}, + title = {A Local Projections Approach to Difference-in-Differences}, + journal = {Journal of Applied Econometrics}, + volume = {40}, + number = {5}, + pages = {741--758}, + year = {2025}, + doi = {10.1002/jae.70000} +} + +@article{Binder1983, + author = {Binder, David A.}, + title = {On the Variances of Asymptotically Normal Estimators from Complex Surveys}, + journal = {International Statistical Review}, + volume = {51}, + number = {3}, + pages = {279--292}, + year = {1983}, + doi = {10.2307/1402588} +} + +@misc{pyfixest, + author = {{The PyFixest Authors}}, + title = {pyfixest: Fast High-Dimensional Fixed Effect Estimation in Python}, + year = {2025}, + url = {https://github.com/py-econometrics/pyfixest} +} diff --git a/paper.md b/paper.md index a20d75de..2b204824 100644 --- a/paper.md +++ b/paper.md @@ -21,7 +21,7 @@ bibliography: paper.bib # Summary `diff-diff` is a Python library for Difference-in-Differences (DiD) causal inference -analysis. It provides 19 estimators covering the full modern DiD toolkit - from classic +analysis. It provides 20 estimators covering the full modern DiD toolkit - from classic two-group/two-period designs through heterogeneity-robust staggered adoption methods, synthetic control hybrids, and sensitivity analysis - under a consistent scikit-learn-style API. Most estimators accept an optional `SurveyDesign` object for design-based variance @@ -41,15 +41,13 @@ modern methods - including Callaway and Sant'Anna [-@Callaway2021], Sun and Abra [-@Sun2021], Borusyak, Jaravel, and Spiess [-@Borusyak2024], and others - are now standard practice in applied work. -The R ecosystem provides mature implementations across several packages: `did` -[@Callaway2021], `fixest` [@Berge2018], `synthdid` [@Arkhangelsky2021], and `HonestDiD` -[@Rambachan2023]. Stata offers `csdid` and `didregress`. Python, however, lacks a unified -DiD library. Practitioners working in Python-based data science workflows - increasingly -common in industry settings for marketing measurement, product experimentation, and policy -evaluation - must either context-switch to R, reimplement methods from scratch, or rely on -partial implementations scattered across unrelated packages. +These methods are well served in R and Stata, but Python lacks a unified DiD library. +Practitioners working in Python-based data science workflows - increasingly common in +industry settings for marketing measurement, product experimentation, and policy +evaluation - must either context-switch to another language, reimplement methods from +scratch, or rely on partial implementations scattered across unrelated packages. -`diff-diff` fills this gap by providing a single-import library that covers 19 estimators +`diff-diff` fills this gap by providing a single-import library that covers 20 estimators with a consistent API, survey-weighted inference, and numerical validation against R. It is also the companion software for the design-based variance framework of @Gerber2026, which establishes design-consistent standard errors for modern DiD estimators under @@ -57,43 +55,55 @@ complex survey designs. It targets both applied researchers who need rigorous ec methods and data science practitioners who need accessible causal inference tools integrated into Python workflows. +# State of the Field + +The R ecosystem provides mature implementations across several packages: `did` +[@Callaway2021], `fixest` [@Berge2018], `synthdid` [@Arkhangelsky2021], and `HonestDiD` +[@Rambachan2023]; Stata offers `csdid` and `didregress`. Python coverage is partial and +fragmented. `pyfixest` [@pyfixest] brings `fixest`-style high-dimensional fixed-effects +regression to Python, including Sun-Abraham, two-stage, and local-projections estimators, +but is organized around its regression engine rather than the wider DiD taxonomy; +`differences` implements Callaway-Sant'Anna group-time estimation; `CausalPy` offers +Bayesian analysis of quasi-experiments, including synthetic control, without +staggered-adoption support. General-purpose causal inference toolkits such as `DoWhy` and +`EconML` target other identification strategies. + +`diff-diff` was built as a new library, rather than as contributions to these packages, +because its central contribution is cross-cutting: one estimator contract, one shared +inference core, and an influence-function architecture that composes design-based survey +variance with every estimator in the taxonomy. To our knowledge, no existing DiD software +in any language provides design-based variance estimation for complex survey data, and no +Python package covers the modern estimator taxonomy end-to-end; `diff-diff` provides +both, validated against the R reference implementations where they exist. + # Key Features -**Breadth of methods.** `diff-diff` implements 19 estimators organized across the modern +**Breadth of methods.** `diff-diff` implements 20 estimators organized across the modern DiD taxonomy. Classic designs include two-group/two-period DiD, two-way fixed effects, and event-study estimation with period-specific effects. Heterogeneity-robust staggered-adoption estimators include Callaway-Sant'Anna [@Callaway2021], Sun-Abraham [@Sun2021], imputation -[@Borusyak2024], two-stage [@Gardner2022], stacked [@Wing2024], and efficient [@Chen2025] -approaches, together with reversible-treatment DiD for non-absorbing interventions -[@deChaisemartin2020] and a ring-indicator estimator for spatial spillovers [@Butts2021]. -Synthetic-control hybrids include synthetic DiD [@Arkhangelsky2021] and the classic -synthetic control method [@Abadie2010]. Extended designs include triple-difference and -staggered triple-difference estimators [@OrtizVillavicencio2025], continuous-treatment DiD -with dose-response curves [@Callaway2024], heterogeneous-adoption designs where no unit -remains untreated [@deChaisemartin2026], nonlinear ETWFE [@Wooldridge2025; @Wooldridge2023], -and triply robust panel estimation [@Athey2025]. Separate diagnostic and sensitivity tools - -outside the 19 estimators - include Goodman-Bacon decomposition [@GoodmanBacon2021], Honest -DiD sensitivity analysis [@Rambachan2023], placebo tests, and pre-trends power analysis -[@Roth2022]. All estimators share a consistent `fit()` interface with -`get_params()`/`set_params()` for configuration, R-style formula support, and rich results -objects with `summary()` output. An optional Rust backend via PyO3 accelerates -compute-intensive operations. +[@Borusyak2024], two-stage [@Gardner2022], stacked [@Wing2024], efficient [@Chen2025], and +local-projections [@Dube2025] approaches, together with reversible-treatment DiD for +non-absorbing interventions [@deChaisemartin2020] and a ring-indicator estimator for +spatial spillovers [@Butts2021]. Synthetic-control hybrids include synthetic DiD +[@Arkhangelsky2021] and the classic synthetic control method [@Abadie2010]. Extended +designs include triple-difference and staggered triple-difference estimators +[@OrtizVillavicencio2025], continuous-treatment DiD with dose-response curves +[@Callaway2024], heterogeneous-adoption designs where no unit remains untreated +[@deChaisemartin2026], nonlinear ETWFE [@Wooldridge2025; @Wooldridge2023], and triply +robust panel estimation [@Athey2025]. Separate diagnostic and sensitivity tools - outside +the 20 estimators - include Goodman-Bacon decomposition [@GoodmanBacon2021], Honest DiD +sensitivity analysis [@Rambachan2023], placebo tests, and pre-trends power analysis +[@Roth2022]. **Survey-weighted inference.** A `SurveyDesign` class supports stratification, primary sampling units, finite population corrections, and probability weights. Variance estimation includes Taylor series linearization, five replicate weight methods (BRR, Fay's BRR, JK1, -JKn, SDR), and survey-aware bootstrap. Survey variance is validated against R's `survey` -package [@Lumley2004] on three real complex-survey datasets - NHANES, RECS 2020, and the -California API school dataset - to a tight tolerance (test gaps < 1e-8, typically below -1e-10). The design-based variance result - that the influence functions of modern DiD -estimators satisfy Binder's (1983) smoothness conditions, so stratified-cluster -linearization yields design-consistent standard errors - is derived in @Gerber2026. No -other DiD package in any language provides integrated survey support. - -**Validation against R.** Point estimates match the R `did`, `synthdid`, and `fixest` -packages to machine precision (differences < 1e-10). Standard errors match exactly for -core estimators including Callaway-Sant'Anna and basic DiD. Validation includes the -canonical MPDTA minimum-wage dataset from Callaway and Sant'Anna [-@Callaway2021]. +JKn, SDR), and survey-aware bootstrap. The design-based variance result - that the +influence functions of modern DiD estimators satisfy the smoothness conditions of +@Binder1983, so stratified-cluster linearization yields design-consistent standard +errors - is derived in @Gerber2026. No other DiD package in any language provides +integrated survey support. **Practitioner tooling.** Beyond estimation, `diff-diff` includes a practitioner decision tree for estimator selection, an 8-step diagnostic workflow based on Baker et al. @@ -101,16 +111,62 @@ tree for estimator selection, an 8-step diagnostic workflow based on Baker et al aggregation utilities for converting individual-level survey responses into geographic-period panels suitable for DiD analysis. +# Software Design + +Every estimator implements a common contract: a scikit-learn-style `fit()` with +`get_params()`/`set_params()` for configuration, R-style formula support, and rich results +dataclasses with `summary()`, `to_dict()`, and `to_dataframe()`. Numerical work is +deliberately centralized: all estimators solve their least-squares problems and their +robust, cluster-robust, and survey variances through a single shared linear-algebra core, +so numerical hardening - rank-deficiency guards, degrees-of-freedom corrections, +small-cluster behavior - lands in one place and propagates to every estimator. Inference +fields (standard error, t-statistic, p-value, confidence interval) are always computed +together and become NaN together when inference is not identified, rather than silently +reporting partial results. + +Two design choices carry the survey capability and the deployment story. First, estimators +compute influence functions for their target parameters, so design-based variance - Taylor +series linearization over strata and clusters, replicate weights, survey-aware +bootstrap - composes uniformly with estimators as different as Callaway-Sant'Anna and +synthetic DiD instead of requiring per-estimator derivations. Second, the runtime +dependency footprint is minimal by policy - numpy, pandas, and scipy only - keeping the +library easy to install in restricted industry environments; high-dimensional fixed +effects are handled by within-transformation rather than by delegating to a heavier +econometrics stack. An optional Rust backend (via PyO3) accelerates compute-intensive +kernels such as synthetic-control weight solving and fixed-effects absorption; the Python +implementation remains canonical, equivalence between backends is enforced by the test +suite, and the library falls back to pure Python automatically when the extension is +unavailable. + +# Research Impact Statement + +`diff-diff` is the companion software of the design-based variance preprint [@Gerber2026]: +the framework derived there is implemented here, and the preprint's numerical results are +produced with the library. Correctness evidence ships with the repository as reproducible +material. Golden-file benchmarks pin point estimates against R's `did`, `synthdid`, and +`fixest` to machine precision (differences < 1e-10), including the canonical MPDTA +minimum-wage application of Callaway and Sant'Anna [-@Callaway2021], with standard errors +matching exactly for core estimators such as Callaway-Sant'Anna and basic DiD. Survey +variance is validated against R's `survey` package [@Lumley2004] on three real +complex-survey datasets - NHANES, RECS 2020, and the California API school data - with +test gaps below 1e-8 and typically below 1e-10. The library is distributed on PyPI with +tagged releases, has six months of continuous public development history (3,000+ +commits), and is exercised by a CI test suite of more than 7,600 tests; 26 tutorial +notebooks and full API documentation are published on Read the Docs, and machine-readable +guides bundled in the wheel (`llms.txt`) make the library directly usable by AI-assisted +analysis workflows. + # AI Usage Disclosure Generative AI tools were used in developing this software and manuscript. Anthropic's -Claude models (the Opus and Sonnet families, via the Claude Code CLI) assisted with code -generation and refactoring, test scaffolding, documentation, and drafting and editing of -this manuscript. The author reviewed, modified, and validated all AI-generated code and -text and made all primary architectural and methodological decisions. Numerical results -were independently verified against established R reference packages (`did`, `synthdid`, -`fixest`, `survey`) for every estimator with an R equivalent, and against the author's -reference derivations or simulation otherwise. The author takes full responsibility for the -accuracy and integrity of the software and this paper. +Claude models (the Opus, Sonnet, and Fable model families, via the Claude Code CLI) +assisted with code generation and refactoring, test scaffolding, documentation, and +drafting and editing of this manuscript. The author reviewed, modified, and validated all +AI-generated code and text and made all primary architectural and methodological +decisions. Numerical results were independently verified against established R reference +packages (`did`, `synthdid`, `fixest`, `survey`) for every estimator with an R +equivalent, and against the author's reference derivations or simulation otherwise. The +author takes full responsibility for the accuracy and integrity of the software and this +paper. # References From e60e813f96b727d5c34f3e8fb18654e81e00a9f0 Mon Sep 17 00:00:00 2001 From: igerber Date: Fri, 3 Jul 2026 08:00:16 -0400 Subject: [PATCH 2/6] docs(paper): brace-protect proper nouns in paper.bib titles California's (Abadie2010) and pyfixest/Python (pyfixest entry) were being lowercased by CSL sentence-casing in the compiled PDF. Co-Authored-By: Claude Fable 5 --- paper.bib | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/paper.bib b/paper.bib index 3c7405c1..0e8f3498 100644 --- a/paper.bib +++ b/paper.bib @@ -220,7 +220,7 @@ @misc{Gerber2026 @article{Abadie2010, author = {Abadie, Alberto and Diamond, Alexis and Hainmueller, Jens}, - title = {Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program}, + title = {Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of {California's} Tobacco Control Program}, journal = {Journal of the American Statistical Association}, volume = {105}, number = {490}, @@ -274,7 +274,7 @@ @article{Binder1983 @misc{pyfixest, author = {{The PyFixest Authors}}, - title = {pyfixest: Fast High-Dimensional Fixed Effect Estimation in Python}, + title = {{pyfixest}: Fast High-Dimensional Fixed Effect Estimation in {Python}}, year = {2025}, url = {https://github.com/py-econometrics/pyfixest} } From c793a563ca3f78c17727d653dcea75aca8ac10ab Mon Sep 17 00:00:00 2001 From: igerber Date: Fri, 3 Jul 2026 08:21:05 -0400 Subject: [PATCH 3/6] docs: address CI review P2s - precise Software Design claims, REGISTRY replicate matrix 13/20 - paper.md Software Design: formula support scoped to the classic regression estimators; least-squares solves centralized but analytical vs resampling-based variance paths distinguished (synthetic DiD placebo/ jackknife noted); joint-inference NaN contract stated as the invariant all estimators share - REGISTRY.md replicate-weight support matrix: 12 of 15 -> 13 of 20; adds ChaisemartinDHaultfoeuille to Supported (closed-form cell-collapse replicate ATT, replicate + n_bootstrap > 0 rejected); Rejected list now enumerates WooldridgeDiD, LPDiD, SpilloverDiD, HeterogeneousAdoptionDiD (TSL-only, NotImplementedError at fit) and SyntheticControl (rejects survey_design entirely), keeping BaconDecomposition diagnostic-only Co-Authored-By: Claude Fable 5 --- docs/methodology/REGISTRY.md | 12 +++++++++--- paper.md | 21 ++++++++++++--------- 2 files changed, 21 insertions(+), 12 deletions(-) diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 33ab7fd3..77a39a12 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -4576,7 +4576,7 @@ variance from the distribution of replicate estimates. design structure is fixed and dropped replicates contribute zero to the sum without changing the scale. Survey df uses `n_valid - 1` for t-based inference. -- **Note:** Replicate-weight support matrix (12 of 15 public estimators): +- **Note:** Replicate-weight support matrix (13 of 20 public estimators): - **Supported**: CallawaySantAnna (reg/ipw/dr with or without covariates, no bootstrap; IF-based replicate variance is covariate-agnostic), ContinuousDiD (no bootstrap), EfficientDiD (no bootstrap), @@ -4587,9 +4587,15 @@ variance from the distribution of replicate estimates. TwoWayFixedEffects (estimator-level refit with within-transformation), SunAbraham (estimator-level refit, replaces `vcov_cohort`), StackedDiD (estimator-level refit with Q-weight composition), - ImputationDiD (two-stage refit), TwoStageDiD (two-stage refit) + ImputationDiD (two-stage refit), TwoStageDiD (two-stage refit), + ChaisemartinDHaultfoeuille (closed-form cell-collapse replicate ATT, + multi-horizon and placebo paths; replicate + `n_bootstrap > 0` rejected + — see the ChaisemartinDHaultfoeuille Notes for the allocator contract) - **Rejected with NotImplementedError**: SyntheticDiD, TROP - (bootstrap-based variance), BaconDecomposition (diagnostic only) + (bootstrap-based variance), WooldridgeDiD, LPDiD, SpilloverDiD, + HeterogeneousAdoptionDiD (TSL-only survey paths; replicate designs + rejected at `fit()`), SyntheticControl (rejects `survey_design` + entirely), BaconDecomposition (diagnostic only) - Estimators with replicate support reject replicate + bootstrap (replicate weights provide analytical variance) - **Note:** When invalid replicates are dropped in `compute_replicate_vcov` diff --git a/paper.md b/paper.md index 2b204824..dc7691ba 100644 --- a/paper.md +++ b/paper.md @@ -114,15 +114,18 @@ geographic-period panels suitable for DiD analysis. # Software Design Every estimator implements a common contract: a scikit-learn-style `fit()` with -`get_params()`/`set_params()` for configuration, R-style formula support, and rich results -dataclasses with `summary()`, `to_dict()`, and `to_dataframe()`. Numerical work is -deliberately centralized: all estimators solve their least-squares problems and their -robust, cluster-robust, and survey variances through a single shared linear-algebra core, -so numerical hardening - rank-deficiency guards, degrees-of-freedom corrections, -small-cluster behavior - lands in one place and propagates to every estimator. Inference -fields (standard error, t-statistic, p-value, confidence interval) are always computed -together and become NaN together when inference is not identified, rather than silently -reporting partial results. +`get_params()`/`set_params()` for configuration and rich results dataclasses with +`summary()`, `to_dict()`, and `to_dataframe()`; the classic regression estimators +additionally accept R-style formulas. Numerical work is deliberately centralized: +estimators solve their least-squares problems through a single shared linear-algebra +core, and analytical robust, cluster-robust, and survey variances route through one +shared sandwich-estimator path, so numerical hardening - rank-deficiency guards, +degrees-of-freedom corrections, small-cluster behavior - lands in one place. Estimators +whose inference is inherently resampling-based - synthetic DiD's placebo and jackknife +variance, for example - use method-specific variance paths. All estimators share one +joint-inference contract: inference fields (standard error, t-statistic, p-value, +confidence interval) are always computed together and become NaN together when inference +is not identified, rather than silently reporting partial results. Two design choices carry the survey capability and the deployment story. First, estimators compute influence functions for their target parameters, so design-based variance - Taylor From 78b00ca202e6da165d7ed7cd85cbf4e5f3ea70b2 Mon Sep 17 00:00:00 2001 From: igerber Date: Fri, 3 Jul 2026 08:25:18 -0400 Subject: [PATCH 4/6] docs: drop draft-pdf CI workflow; scope survey-composition claims (review round 2) - Remove .github/workflows/draft-pdf.yml: a durable CI job for the paper is overkill (user call). It served its purpose as a one-time compile check - the PDF built successfully and was visually verified; JOSS's editorialbot compiles the paper on demand during review. - paper.md: survey variance no longer claimed to compose with "every estimator" / "uniformly" - now states per-estimator support documented in a compatibility matrix with unsupported combinations failing closed - llms.txt / llms-full.txt / choosing_estimator.rst: "All estimators" -> "Most estimators" with SyntheticControl called out and a pointer to the Survey Design Support matrix (aligns with existing README wording) Co-Authored-By: Claude Fable 5 --- .github/workflows/draft-pdf.yml | 36 --------------------------------- diff_diff/guides/llms-full.txt | 2 +- diff_diff/guides/llms.txt | 2 +- docs/choosing_estimator.rst | 6 +++--- paper.md | 9 ++++++--- 5 files changed, 11 insertions(+), 44 deletions(-) delete mode 100644 .github/workflows/draft-pdf.yml diff --git a/.github/workflows/draft-pdf.yml b/.github/workflows/draft-pdf.yml deleted file mode 100644 index 9911561b..00000000 --- a/.github/workflows/draft-pdf.yml +++ /dev/null @@ -1,36 +0,0 @@ -name: Draft PDF - -on: - push: - branches: [main] - paths: - - paper.md - - paper.bib - - .github/workflows/draft-pdf.yml - pull_request: - paths: - - paper.md - - paper.bib - - .github/workflows/draft-pdf.yml - workflow_dispatch: - -permissions: - contents: read - -jobs: - paper: - runs-on: ubuntu-latest - name: Compile JOSS paper draft - steps: - - name: Checkout - uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7 - - name: Build draft PDF - uses: openjournals/openjournals-draft-action@85a18372e48f551d8af9ddb7a747de685fbbb01c # v1.0 - with: - journal: joss - paper-path: paper.md - - name: Upload paper artifact - uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7 - with: - name: joss-paper - path: paper.pdf diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt index 58d9f6a9..281c2ef3 100644 --- a/diff_diff/guides/llms-full.txt +++ b/diff_diff/guides/llms-full.txt @@ -2075,7 +2075,7 @@ clear_cache() ## Survey Support -All estimators accept an optional `survey_design` parameter in `fit()`. Pass a `SurveyDesign` object to get design-based variance estimation. +Most estimators accept an optional `survey_design` parameter in `fit()` (`SyntheticControl` rejects it as not yet supported); depth of support varies by estimator - see the compatibility matrix in `docs/choosing_estimator.rst` (Survey Design Support). Pass a `SurveyDesign` object to get design-based variance estimation. ```python from diff_diff import SurveyDesign, CallawaySantAnna diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt index d584eeba..5268ce18 100644 --- a/diff_diff/guides/llms.txt +++ b/diff_diff/guides/llms.txt @@ -104,7 +104,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")` ## Survey Support -All estimators accept an optional `survey_design` parameter. Pass a `SurveyDesign` object to get design-based variance estimation: +Most estimators accept an optional `survey_design` parameter (`SyntheticControl` does not yet support it); coverage and weight types vary by estimator - see the [Survey Design Support matrix](https://diff-diff.readthedocs.io/en/stable/choosing_estimator.html#survey-design-support). Pass a `SurveyDesign` object to get design-based variance estimation: - **Design elements**: strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling, nest - **Variance methods**: Taylor Series Linearization (TSL), replicate weights (BRR/Fay/JK1/JKn/SDR), survey-aware bootstrap diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index 2f7bdeef..93fa402d 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -771,10 +771,10 @@ If you're unsure which estimator to use: Survey Design Support --------------------- -All estimators accept an optional ``survey_design`` parameter in ``fit()`` -(``SyntheticControl`` does not yet support it and raises ``NotImplementedError``). +Most estimators support an optional ``survey_design`` parameter in ``fit()`` +(``SyntheticControl`` accepts the parameter but raises ``NotImplementedError``). Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance -estimation. The depth of support varies by estimator: +estimation. The depth of support varies by estimator and variance method: .. note:: diff --git a/paper.md b/paper.md index dc7691ba..ca02347b 100644 --- a/paper.md +++ b/paper.md @@ -71,7 +71,8 @@ staggered-adoption support. General-purpose causal inference toolkits such as `D `diff-diff` was built as a new library, rather than as contributions to these packages, because its central contribution is cross-cutting: one estimator contract, one shared inference core, and an influence-function architecture that composes design-based survey -variance with every estimator in the taxonomy. To our knowledge, no existing DiD software +variance across the estimator taxonomy, with per-estimator support documented in a +compatibility matrix and unsupported combinations failing closed. To our knowledge, no existing DiD software in any language provides design-based variance estimation for complex survey data, and no Python package covers the modern estimator taxonomy end-to-end; `diff-diff` provides both, validated against the R reference implementations where they exist. @@ -130,8 +131,10 @@ is not identified, rather than silently reporting partial results. Two design choices carry the survey capability and the deployment story. First, estimators compute influence functions for their target parameters, so design-based variance - Taylor series linearization over strata and clusters, replicate weights, survey-aware -bootstrap - composes uniformly with estimators as different as Callaway-Sant'Anna and -synthetic DiD instead of requiring per-estimator derivations. Second, the runtime +bootstrap - composes through one shared mechanism with estimators as different as +Callaway-Sant'Anna and synthetic DiD; supported design-estimator combinations are +documented in a per-estimator matrix, and unsupported ones are rejected explicitly rather +than silently approximated. Second, the runtime dependency footprint is minimal by policy - numpy, pandas, and scipy only - keeping the library easy to install in restricted industry environments; high-dimensional fixed effects are handled by within-transformation rather than by delegating to a heavier From c366f10ffc5d51a440684f744dc622ee631ed709 Mon Sep 17 00:00:00 2001 From: igerber Date: Fri, 3 Jul 2026 08:34:34 -0400 Subject: [PATCH 5/6] docs: SDID survey matrix accuracy + split shared-core vs resampling survey claim (review round 3) - choosing_estimator.rst: SyntheticDiD Strata/PSU/FPC cell "Via bootstrap" -> "Full (method-specific)" - full-design placebo (stratified permutation + weighted FW, FPC no-op) and jackknife (PSU-level LOO with stratum aggregation, Rust & Rao 1996) shipped alongside the Rao-Wu bootstrap; the "placebo/jackknife remain pweight-only ... tracked in TODO.md" note was stale (no TODO row exists - the work landed). Legend "Via bootstrap" now describes only TROP; new "Full (method-specific)" legend entry points at REGISTRY's survey support matrix. - paper.md Software Design: survey-variance claim split - IF/regression estimators route through the shared survey-variance core (TSL + replicate weights); resampling estimators (SyntheticDiD, TROP) use documented method-specific bootstrap/placebo/jackknife paths. Co-Authored-By: Claude Fable 5 --- docs/choosing_estimator.rst | 28 +++++++++++++++------------- paper.md | 14 ++++++++------ 2 files changed, 23 insertions(+), 19 deletions(-) diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst index 93fa402d..b9abcd04 100644 --- a/docs/choosing_estimator.rst +++ b/docs/choosing_estimator.rst @@ -870,7 +870,7 @@ estimation. The depth of support varies by estimator and variance method: - Multiplier at PSU * - ``SyntheticDiD`` - pweight only - - Via bootstrap + - Full (method-specific) - -- - Hybrid pairs-bootstrap + Rao-Wu rescaled (bootstrap only) * - ``SyntheticControl`` @@ -908,24 +908,26 @@ estimation. The depth of support varies by estimator and variance method: - **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance - **Full (pweight only)**: Full TSL with strata/PSU/FPC, but only ``pweight`` accepted (``fweight``/``aweight`` rejected because composition changes weight semantics) -- **Via bootstrap**: Strata/PSU/FPC supported only with bootstrap variance. ``TROP`` uses bootstrap by default. ``SyntheticDiD`` supports strata/PSU/FPC on ``variance_method='bootstrap'`` via a hybrid pairs-bootstrap + Rao-Wu rescaling composition (see the ``Note (survey + bootstrap composition)`` in REGISTRY.md §SyntheticDiD); ``placebo`` and ``jackknife`` remain pweight-only. +- **Via bootstrap**: Strata/PSU/FPC supported only with bootstrap variance (``TROP``, which uses bootstrap by default) +- **Full (method-specific)**: ``SyntheticDiD`` supports strata/PSU/FPC on all three variance methods via method-specific survey paths — see the note below and the ``Note (survey support matrix)`` in REGISTRY.md §SyntheticDiD - **pweight only** (Weights column): Only ``pweight`` accepted; ``fweight``/``aweight`` raise an error - **Diagnostic**: Weighted descriptive statistics only (no inference) - **--**: Not supported .. note:: - ``SyntheticDiD`` supports survey designs on ``variance_method='bootstrap'`` - — both pweight-only and full strata/PSU/FPC — via a hybrid pairs-bootstrap - composed with per-draw Rao-Wu rescaled weights fed into a weighted - Frank-Wolfe re-estimation of ω and λ. See the - ``Note (survey + bootstrap composition)`` in REGISTRY.md §SyntheticDiD - for the objective form and argmin-set caveat. - - ``variance_method='placebo'`` and ``variance_method='jackknife'`` remain - pweight-only — composing placebo permutations / leave-one-out with - Rao-Wu rescaling under the weighted objective is a separate derivation - (tracked in ``TODO.md``). + ``SyntheticDiD`` supports survey designs — both pweight-only and full + strata/PSU/FPC — on all three variance methods, each via a + method-specific path: ``bootstrap`` composes a hybrid pairs-bootstrap + with per-draw Rao-Wu rescaled weights fed into a weighted Frank-Wolfe + re-estimation of ω and λ; ``placebo`` switches to stratified + permutation (pseudo-treated draws within strata containing treated + units) with weighted-FW re-estimation, and FPC is a documented no-op + for the permutation test; ``jackknife`` switches to PSU-level + leave-one-out with stratum aggregation (Rust & Rao 1996). + Replicate-weight designs are rejected. See the + ``Note (survey support matrix)`` and the per-method composition notes + in REGISTRY.md §SyntheticDiD. For the full walkthrough with code examples, see the `survey tutorial `_. diff --git a/paper.md b/paper.md index ca02347b..95ac021a 100644 --- a/paper.md +++ b/paper.md @@ -128,12 +128,14 @@ joint-inference contract: inference fields (standard error, t-statistic, p-value confidence interval) are always computed together and become NaN together when inference is not identified, rather than silently reporting partial results. -Two design choices carry the survey capability and the deployment story. First, estimators -compute influence functions for their target parameters, so design-based variance - Taylor -series linearization over strata and clusters, replicate weights, survey-aware -bootstrap - composes through one shared mechanism with estimators as different as -Callaway-Sant'Anna and synthetic DiD; supported design-estimator combinations are -documented in a per-estimator matrix, and unsupported ones are rejected explicitly rather +Two design choices carry the survey capability and the deployment story. First, the +regression- and influence-function-based estimators compute influence functions for their +target parameters, so design-based variance - Taylor series linearization over strata and +clusters, and replicate weights - routes through one shared survey-variance core rather +than requiring per-estimator derivations; resampling-based estimators such as synthetic +DiD and TROP compose survey designs through documented method-specific bootstrap, +placebo, and jackknife paths. Supported design-estimator combinations are listed in a +per-estimator compatibility matrix, and unsupported ones are rejected explicitly rather than silently approximated. Second, the runtime dependency footprint is minimal by policy - numpy, pandas, and scipy only - keeping the library easy to install in restricted industry environments; high-dimensional fixed From 8331fd5c66342aee1cf9f8e28bf4ab10504f568d Mon Sep 17 00:00:00 2001 From: igerber Date: Fri, 3 Jul 2026 08:44:54 -0400 Subject: [PATCH 6/6] docs(registry): Bacon out of the replicate-rejected list - diagnostic-only, outside the 20 count (review round 4 P3) 13 supported + 7 rejected = 20 estimators; BaconDecomposition gets its own diagnostic-only line so the matrix cannot be read as 13-of-21. Co-Authored-By: Claude Fable 5 --- docs/methodology/REGISTRY.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 77a39a12..07ad7e07 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -4595,7 +4595,9 @@ variance from the distribution of replicate estimates. (bootstrap-based variance), WooldridgeDiD, LPDiD, SpilloverDiD, HeterogeneousAdoptionDiD (TSL-only survey paths; replicate designs rejected at `fit()`), SyntheticControl (rejects `survey_design` - entirely), BaconDecomposition (diagnostic only) + entirely) + - **BaconDecomposition** is diagnostic-only — outside the 20-estimator + count — and likewise rejects replicate designs - Estimators with replicate support reject replicate + bootstrap (replicate weights provide analytical variance) - **Note:** When invalid replicates are dropped in `compute_replicate_vcov`