From 6c841d24ae29ec6b550fd985b2e395d833f57059 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 3 Jul 2026 07:57:32 -0400
Subject: [PATCH 1/6] docs(paper): finalize JOSS submission - 20 estimators,
 required sections, draft-pdf CI

- paper.md: update estimator count 19 -> 20 (LPDiD merged after #564);
  add LPDiD to the staggered-adoption method list with Dube2025 citation
- paper.md: restructure to current JOSS required sections - add State of
  the Field (Python/R/Stata landscape with build-vs-contribute
  justification), Software Design (shared inference core,
  influence-function survey architecture, minimal-dependency policy, Rust
  backend), and Research Impact Statement (companion preprint, golden-file
  R validation, real-data survey validation, community-readiness signals);
  1,269 words, within the 750-1750 target
- paper.bib: add Dube2025 (LP-DiD, J. Applied Econometrics), Binder1983
  (previously cited in-text without a bib entry), pyfixest (software
  citation for State of the Field)
- paper.md: AI disclosure updated to name the Opus, Sonnet, and Fable
  model families
- llms.txt / llms-full.txt: stale counts 19 -> 20 and "13 of 16" -> "13
  of 20" replicate-weight support
- practitioner_decision_tree.rst: stale "17 estimators" -> 20, mention
  Local Projections DiD
- choosing_estimator.rst survey matrix: dCDH replicate weights "--" ->
  "Full (analytical)" (support landed with test_survey_dcdh_replicate_psu
  coverage); add missing SpilloverDiD and SyntheticControl rows; note the
  SyntheticControl NotImplementedError in the intro
- .github/workflows/draft-pdf.yml: SHA-pinned openjournals draft action
  compiles the paper on paper.md/paper.bib changes and uploads the PDF
  artifact

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 .github/workflows/draft-pdf.yml     |  36 +++++++
 diff_diff/guides/llms-full.txt      |   2 +-
 diff_diff/guides/llms.txt           |   2 +-
 docs/choosing_estimator.rst         |  15 ++-
 docs/practitioner_decision_tree.rst |   5 +-
 paper.bib                           |  29 ++++++
 paper.md                            | 146 +++++++++++++++++++---------
 7 files changed, 184 insertions(+), 51 deletions(-)
 create mode 100644 .github/workflows/draft-pdf.yml

diff --git a/.github/workflows/draft-pdf.yml b/.github/workflows/draft-pdf.yml
new file mode 100644
index 00000000..9911561b
--- /dev/null
+++ b/.github/workflows/draft-pdf.yml
@@ -0,0 +1,36 @@
+name: Draft PDF
+
+on:
+  push:
+    branches: [main]
+    paths:
+      - paper.md
+      - paper.bib
+      - .github/workflows/draft-pdf.yml
+  pull_request:
+    paths:
+      - paper.md
+      - paper.bib
+      - .github/workflows/draft-pdf.yml
+  workflow_dispatch:
+
+permissions:
+  contents: read
+
+jobs:
+  paper:
+    runs-on: ubuntu-latest
+    name: Compile JOSS paper draft
+    steps:
+      - name: Checkout
+        uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7
+      - name: Build draft PDF
+        uses: openjournals/openjournals-draft-action@85a18372e48f551d8af9ddb7a747de685fbbb01c # v1.0
+        with:
+          journal: joss
+          paper-path: paper.md
+      - name: Upload paper artifact
+        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
+        with:
+          name: joss-paper
+          path: paper.pdf
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
index d5863bc0..58d9f6a9 100644
--- a/diff_diff/guides/llms-full.txt
+++ b/diff_diff/guides/llms-full.txt
@@ -2120,7 +2120,7 @@ sd_female, data_female = sd.subpopulation(data, mask=lambda df: df['sex'] == 'F'
 
 **Key features:**
 - Taylor Series Linearization (TSL) variance with strata + PSU + FPC
-- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (13 of 16 estimators, including dCDH)
+- Replicate weight variance: BRR, Fay's BRR, JK1, JKn, SDR (13 of 20 estimators, including dCDH)
 - Survey-aware bootstrap: multiplier at PSU (Hall-Mammen wild; dCDH, staggered) or Rao-Wu rescaled (SunAbraham, SyntheticDiD, TROP). SyntheticDiD bootstrap composes Rao-Wu rescaled per-draw weights with the weighted Frank-Wolfe variant of `_sc_weight_fw` (PR #355): each draw solves `min ||A·diag(rw)·ω - b||² + ζ²·Σ rw_i ω_i²` and composes `ω_eff = rw·ω/Σ(rw·ω)` for the SDID estimator. Pweight-only fits use constant `rw = w_control`; full designs use Rao-Wu. SDID's placebo (stratified permutation + weighted FW) and jackknife (PSU-level LOO with stratum aggregation, Rust & Rao 1996) paths also support pweight-only and full strata/PSU/FPC designs
 - DEFF diagnostics, subpopulation analysis, weight trimming (`trim_weights`)
 - Repeated cross-sections: `CallawaySantAnna(panel=False)`
diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt
index f61f5f3d..d584eeba 100644
--- a/diff_diff/guides/llms.txt
+++ b/diff_diff/guides/llms.txt
@@ -2,7 +2,7 @@
 
 > A Python library for Difference-in-Differences (DiD) causal inference analysis. Provides sklearn-like estimators with statsmodels-style summary output for econometric analysis.
 
-diff-diff offers 19 estimators covering basic 2x2 DiD, modern staggered adoption methods, reversible (non-absorbing) treatments, advanced panel estimators, nonlinear models, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), complex survey designs (strata/PSU/FPC, replicate weights, design-based variance), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP.
+diff-diff offers 20 estimators covering basic 2x2 DiD, modern staggered adoption methods, reversible (non-absorbing) treatments, advanced panel estimators, nonlinear models, and diagnostic tools. It supports robust and cluster-robust standard errors, wild cluster bootstrap, formula and column-name interfaces, fixed effects (dummy and absorbed), complex survey designs (strata/PSU/FPC, replicate weights, design-based variance), and publication-ready output. The optional Rust backend accelerates compute-intensive estimators like Synthetic DiD and TROP.
 
 - Install: `pip install diff-diff`
 - License: MIT
diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst
index c3edcc31..2f7bdeef 100644
--- a/docs/choosing_estimator.rst
+++ b/docs/choosing_estimator.rst
@@ -771,7 +771,8 @@ If you're unsure which estimator to use:
 Survey Design Support
 ---------------------
 
-All estimators accept an optional ``survey_design`` parameter in ``fit()``.
+All estimators accept an optional ``survey_design`` parameter in ``fit()``
+(``SyntheticControl`` does not yet support it and raises ``NotImplementedError``).
 Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance
 estimation. The depth of support varies by estimator:
 
@@ -820,7 +821,7 @@ estimation. The depth of support varies by estimator:
    * - ``ChaisemartinDHaultfoeuille``
      - pweight only
      - Full (TSL)
-     - --
+     - Full (analytical)
      - Group-level (warning)
    * - ``TripleDifference``
      - pweight only
@@ -872,6 +873,11 @@ estimation. The depth of support varies by estimator:
      - Via bootstrap
      - --
      - Hybrid pairs-bootstrap + Rao-Wu rescaled (bootstrap only)
+   * - ``SyntheticControl``
+     - --
+     - --
+     - --
+     - --
    * - ``TROP``
      - pweight only
      - Via bootstrap
@@ -887,6 +893,11 @@ estimation. The depth of support varies by estimator:
      - Full (Binder TSL)
      - --
      - --
+   * - ``SpilloverDiD``
+     - pweight only
+     - Full (Binder TSL + Conley)
+     - --
+     - --
    * - ``BaconDecomposition``
      - Diagnostic
      - Diagnostic
diff --git a/docs/practitioner_decision_tree.rst b/docs/practitioner_decision_tree.rst
index 1dd6e5b5..8aeed5cc 100644
--- a/docs/practitioner_decision_tree.rst
+++ b/docs/practitioner_decision_tree.rst
@@ -463,9 +463,10 @@ At a Glance
 What About the Other Estimators?
 --------------------------------
 
-diff-diff has 17 estimators covering advanced scenarios: Sun-Abraham for
+diff-diff has 20 estimators covering advanced scenarios: Sun-Abraham for
 interaction-weighted estimation, Imputation DiD and Two-Stage DiD for alternative
-staggered approaches, Stacked DiD, Efficient DiD, Triple Difference, TROP, and more.
+staggered approaches, Local Projections DiD, Stacked DiD, Efficient DiD,
+Triple Difference, TROP, and more.
 The six scenarios above cover the most common business use cases.
 
 For the full academic decision tree with all estimators, see :doc:`choosing_estimator`.
diff --git a/paper.bib b/paper.bib
index 6aa53f0d..3c7405c1 100644
--- a/paper.bib
+++ b/paper.bib
@@ -249,3 +249,32 @@ @misc{deChaisemartin2026
   primaryclass  = {econ.EM},
   doi       = {10.48550/arXiv.2405.04465}
 }
+
+@article{Dube2025,
+  author    = {Dube, Arindrajit and Girardi, Daniele and Jord{\`a}, {\`O}scar and Taylor, Alan M.},
+  title     = {A Local Projections Approach to Difference-in-Differences},
+  journal   = {Journal of Applied Econometrics},
+  volume    = {40},
+  number    = {5},
+  pages     = {741--758},
+  year      = {2025},
+  doi       = {10.1002/jae.70000}
+}
+
+@article{Binder1983,
+  author    = {Binder, David A.},
+  title     = {On the Variances of Asymptotically Normal Estimators from Complex Surveys},
+  journal   = {International Statistical Review},
+  volume    = {51},
+  number    = {3},
+  pages     = {279--292},
+  year      = {1983},
+  doi       = {10.2307/1402588}
+}
+
+@misc{pyfixest,
+  author    = {{The PyFixest Authors}},
+  title     = {pyfixest: Fast High-Dimensional Fixed Effect Estimation in Python},
+  year      = {2025},
+  url       = {https://github.com/py-econometrics/pyfixest}
+}
diff --git a/paper.md b/paper.md
index a20d75de..2b204824 100644
--- a/paper.md
+++ b/paper.md
@@ -21,7 +21,7 @@ bibliography: paper.bib
 # Summary
 
 `diff-diff` is a Python library for Difference-in-Differences (DiD) causal inference
-analysis. It provides 19 estimators covering the full modern DiD toolkit - from classic
+analysis. It provides 20 estimators covering the full modern DiD toolkit - from classic
 two-group/two-period designs through heterogeneity-robust staggered adoption methods,
 synthetic control hybrids, and sensitivity analysis - under a consistent scikit-learn-style
 API. Most estimators accept an optional `SurveyDesign` object for design-based variance
@@ -41,15 +41,13 @@ modern methods - including Callaway and Sant'Anna [-@Callaway2021], Sun and Abra
 [-@Sun2021], Borusyak, Jaravel, and Spiess [-@Borusyak2024], and others - are now standard
 practice in applied work.
 
-The R ecosystem provides mature implementations across several packages: `did`
-[@Callaway2021], `fixest` [@Berge2018], `synthdid` [@Arkhangelsky2021], and `HonestDiD`
-[@Rambachan2023]. Stata offers `csdid` and `didregress`. Python, however, lacks a unified
-DiD library. Practitioners working in Python-based data science workflows - increasingly
-common in industry settings for marketing measurement, product experimentation, and policy
-evaluation - must either context-switch to R, reimplement methods from scratch, or rely on
-partial implementations scattered across unrelated packages.
+These methods are well served in R and Stata, but Python lacks a unified DiD library.
+Practitioners working in Python-based data science workflows - increasingly common in
+industry settings for marketing measurement, product experimentation, and policy
+evaluation - must either context-switch to another language, reimplement methods from
+scratch, or rely on partial implementations scattered across unrelated packages.
 
-`diff-diff` fills this gap by providing a single-import library that covers 19 estimators
+`diff-diff` fills this gap by providing a single-import library that covers 20 estimators
 with a consistent API, survey-weighted inference, and numerical validation against R. It
 is also the companion software for the design-based variance framework of @Gerber2026,
 which establishes design-consistent standard errors for modern DiD estimators under
@@ -57,43 +55,55 @@ complex survey designs. It targets both applied researchers who need rigorous ec
 methods and data science practitioners who need accessible causal inference tools
 integrated into Python workflows.
 
+# State of the Field
+
+The R ecosystem provides mature implementations across several packages: `did`
+[@Callaway2021], `fixest` [@Berge2018], `synthdid` [@Arkhangelsky2021], and `HonestDiD`
+[@Rambachan2023]; Stata offers `csdid` and `didregress`. Python coverage is partial and
+fragmented. `pyfixest` [@pyfixest] brings `fixest`-style high-dimensional fixed-effects
+regression to Python, including Sun-Abraham, two-stage, and local-projections estimators,
+but is organized around its regression engine rather than the wider DiD taxonomy;
+`differences` implements Callaway-Sant'Anna group-time estimation; `CausalPy` offers
+Bayesian analysis of quasi-experiments, including synthetic control, without
+staggered-adoption support. General-purpose causal inference toolkits such as `DoWhy` and
+`EconML` target other identification strategies.
+
+`diff-diff` was built as a new library, rather than as contributions to these packages,
+because its central contribution is cross-cutting: one estimator contract, one shared
+inference core, and an influence-function architecture that composes design-based survey
+variance with every estimator in the taxonomy. To our knowledge, no existing DiD software
+in any language provides design-based variance estimation for complex survey data, and no
+Python package covers the modern estimator taxonomy end-to-end; `diff-diff` provides
+both, validated against the R reference implementations where they exist.
+
 # Key Features
 
-**Breadth of methods.** `diff-diff` implements 19 estimators organized across the modern
+**Breadth of methods.** `diff-diff` implements 20 estimators organized across the modern
 DiD taxonomy. Classic designs include two-group/two-period DiD, two-way fixed effects, and
 event-study estimation with period-specific effects. Heterogeneity-robust staggered-adoption
 estimators include Callaway-Sant'Anna [@Callaway2021], Sun-Abraham [@Sun2021], imputation
-[@Borusyak2024], two-stage [@Gardner2022], stacked [@Wing2024], and efficient [@Chen2025]
-approaches, together with reversible-treatment DiD for non-absorbing interventions
-[@deChaisemartin2020] and a ring-indicator estimator for spatial spillovers [@Butts2021].
-Synthetic-control hybrids include synthetic DiD [@Arkhangelsky2021] and the classic
-synthetic control method [@Abadie2010]. Extended designs include triple-difference and
-staggered triple-difference estimators [@OrtizVillavicencio2025], continuous-treatment DiD
-with dose-response curves [@Callaway2024], heterogeneous-adoption designs where no unit
-remains untreated [@deChaisemartin2026], nonlinear ETWFE [@Wooldridge2025; @Wooldridge2023],
-and triply robust panel estimation [@Athey2025]. Separate diagnostic and sensitivity tools -
-outside the 19 estimators - include Goodman-Bacon decomposition [@GoodmanBacon2021], Honest
-DiD sensitivity analysis [@Rambachan2023], placebo tests, and pre-trends power analysis
-[@Roth2022]. All estimators share a consistent `fit()` interface with
-`get_params()`/`set_params()` for configuration, R-style formula support, and rich results
-objects with `summary()` output. An optional Rust backend via PyO3 accelerates
-compute-intensive operations.
+[@Borusyak2024], two-stage [@Gardner2022], stacked [@Wing2024], efficient [@Chen2025], and
+local-projections [@Dube2025] approaches, together with reversible-treatment DiD for
+non-absorbing interventions [@deChaisemartin2020] and a ring-indicator estimator for
+spatial spillovers [@Butts2021]. Synthetic-control hybrids include synthetic DiD
+[@Arkhangelsky2021] and the classic synthetic control method [@Abadie2010]. Extended
+designs include triple-difference and staggered triple-difference estimators
+[@OrtizVillavicencio2025], continuous-treatment DiD with dose-response curves
+[@Callaway2024], heterogeneous-adoption designs where no unit remains untreated
+[@deChaisemartin2026], nonlinear ETWFE [@Wooldridge2025; @Wooldridge2023], and triply
+robust panel estimation [@Athey2025]. Separate diagnostic and sensitivity tools - outside
+the 20 estimators - include Goodman-Bacon decomposition [@GoodmanBacon2021], Honest DiD
+sensitivity analysis [@Rambachan2023], placebo tests, and pre-trends power analysis
+[@Roth2022].
 
 **Survey-weighted inference.** A `SurveyDesign` class supports stratification, primary
 sampling units, finite population corrections, and probability weights. Variance estimation
 includes Taylor series linearization, five replicate weight methods (BRR, Fay's BRR, JK1,
-JKn, SDR), and survey-aware bootstrap. Survey variance is validated against R's `survey`
-package [@Lumley2004] on three real complex-survey datasets - NHANES, RECS 2020, and the
-California API school dataset - to a tight tolerance (test gaps < 1e-8, typically below
-1e-10). The design-based variance result - that the influence functions of modern DiD
-estimators satisfy Binder's (1983) smoothness conditions, so stratified-cluster
-linearization yields design-consistent standard errors - is derived in @Gerber2026. No
-other DiD package in any language provides integrated survey support.
-
-**Validation against R.** Point estimates match the R `did`, `synthdid`, and `fixest`
-packages to machine precision (differences < 1e-10). Standard errors match exactly for
-core estimators including Callaway-Sant'Anna and basic DiD. Validation includes the
-canonical MPDTA minimum-wage dataset from Callaway and Sant'Anna [-@Callaway2021].
+JKn, SDR), and survey-aware bootstrap. The design-based variance result - that the
+influence functions of modern DiD estimators satisfy the smoothness conditions of
+@Binder1983, so stratified-cluster linearization yields design-consistent standard
+errors - is derived in @Gerber2026. No other DiD package in any language provides
+integrated survey support.
 
 **Practitioner tooling.** Beyond estimation, `diff-diff` includes a practitioner decision
 tree for estimator selection, an 8-step diagnostic workflow based on Baker et al.
@@ -101,16 +111,62 @@ tree for estimator selection, an 8-step diagnostic workflow based on Baker et al
 aggregation utilities for converting individual-level survey responses into
 geographic-period panels suitable for DiD analysis.
 
+# Software Design
+
+Every estimator implements a common contract: a scikit-learn-style `fit()` with
+`get_params()`/`set_params()` for configuration, R-style formula support, and rich results
+dataclasses with `summary()`, `to_dict()`, and `to_dataframe()`. Numerical work is
+deliberately centralized: all estimators solve their least-squares problems and their
+robust, cluster-robust, and survey variances through a single shared linear-algebra core,
+so numerical hardening - rank-deficiency guards, degrees-of-freedom corrections,
+small-cluster behavior - lands in one place and propagates to every estimator. Inference
+fields (standard error, t-statistic, p-value, confidence interval) are always computed
+together and become NaN together when inference is not identified, rather than silently
+reporting partial results.
+
+Two design choices carry the survey capability and the deployment story. First, estimators
+compute influence functions for their target parameters, so design-based variance - Taylor
+series linearization over strata and clusters, replicate weights, survey-aware
+bootstrap - composes uniformly with estimators as different as Callaway-Sant'Anna and
+synthetic DiD instead of requiring per-estimator derivations. Second, the runtime
+dependency footprint is minimal by policy - numpy, pandas, and scipy only - keeping the
+library easy to install in restricted industry environments; high-dimensional fixed
+effects are handled by within-transformation rather than by delegating to a heavier
+econometrics stack. An optional Rust backend (via PyO3) accelerates compute-intensive
+kernels such as synthetic-control weight solving and fixed-effects absorption; the Python
+implementation remains canonical, equivalence between backends is enforced by the test
+suite, and the library falls back to pure Python automatically when the extension is
+unavailable.
+
+# Research Impact Statement
+
+`diff-diff` is the companion software of the design-based variance preprint [@Gerber2026]:
+the framework derived there is implemented here, and the preprint's numerical results are
+produced with the library. Correctness evidence ships with the repository as reproducible
+material. Golden-file benchmarks pin point estimates against R's `did`, `synthdid`, and
+`fixest` to machine precision (differences < 1e-10), including the canonical MPDTA
+minimum-wage application of Callaway and Sant'Anna [-@Callaway2021], with standard errors
+matching exactly for core estimators such as Callaway-Sant'Anna and basic DiD. Survey
+variance is validated against R's `survey` package [@Lumley2004] on three real
+complex-survey datasets - NHANES, RECS 2020, and the California API school data - with
+test gaps below 1e-8 and typically below 1e-10. The library is distributed on PyPI with
+tagged releases, has six months of continuous public development history (3,000+
+commits), and is exercised by a CI test suite of more than 7,600 tests; 26 tutorial
+notebooks and full API documentation are published on Read the Docs, and machine-readable
+guides bundled in the wheel (`llms.txt`) make the library directly usable by AI-assisted
+analysis workflows.
+
 # AI Usage Disclosure
 
 Generative AI tools were used in developing this software and manuscript. Anthropic's
-Claude models (the Opus and Sonnet families, via the Claude Code CLI) assisted with code
-generation and refactoring, test scaffolding, documentation, and drafting and editing of
-this manuscript. The author reviewed, modified, and validated all AI-generated code and
-text and made all primary architectural and methodological decisions. Numerical results
-were independently verified against established R reference packages (`did`, `synthdid`,
-`fixest`, `survey`) for every estimator with an R equivalent, and against the author's
-reference derivations or simulation otherwise. The author takes full responsibility for the
-accuracy and integrity of the software and this paper.
+Claude models (the Opus, Sonnet, and Fable model families, via the Claude Code CLI)
+assisted with code generation and refactoring, test scaffolding, documentation, and
+drafting and editing of this manuscript. The author reviewed, modified, and validated all
+AI-generated code and text and made all primary architectural and methodological
+decisions. Numerical results were independently verified against established R reference
+packages (`did`, `synthdid`, `fixest`, `survey`) for every estimator with an R
+equivalent, and against the author's reference derivations or simulation otherwise. The
+author takes full responsibility for the accuracy and integrity of the software and this
+paper.
 
 # References

From e60e813f96b727d5c34f3e8fb18654e81e00a9f0 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 3 Jul 2026 08:00:16 -0400
Subject: [PATCH 2/6] docs(paper): brace-protect proper nouns in paper.bib
 titles

California's (Abadie2010) and pyfixest/Python (pyfixest entry) were being
lowercased by CSL sentence-casing in the compiled PDF.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 paper.bib | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/paper.bib b/paper.bib
index 3c7405c1..0e8f3498 100644
--- a/paper.bib
+++ b/paper.bib
@@ -220,7 +220,7 @@ @misc{Gerber2026
 
 @article{Abadie2010,
   author    = {Abadie, Alberto and Diamond, Alexis and Hainmueller, Jens},
-  title     = {Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program},
+  title     = {Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of {California's} Tobacco Control Program},
   journal   = {Journal of the American Statistical Association},
   volume    = {105},
   number    = {490},
@@ -274,7 +274,7 @@ @article{Binder1983
 
 @misc{pyfixest,
   author    = {{The PyFixest Authors}},
-  title     = {pyfixest: Fast High-Dimensional Fixed Effect Estimation in Python},
+  title     = {{pyfixest}: Fast High-Dimensional Fixed Effect Estimation in {Python}},
   year      = {2025},
   url       = {https://github.com/py-econometrics/pyfixest}
 }

From c793a563ca3f78c17727d653dcea75aca8ac10ab Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 3 Jul 2026 08:21:05 -0400
Subject: [PATCH 3/6] docs: address CI review P2s - precise Software Design
 claims, REGISTRY replicate matrix 13/20

- paper.md Software Design: formula support scoped to the classic
  regression estimators; least-squares solves centralized but analytical
  vs resampling-based variance paths distinguished (synthetic DiD placebo/
  jackknife noted); joint-inference NaN contract stated as the invariant
  all estimators share
- REGISTRY.md replicate-weight support matrix: 12 of 15 -> 13 of 20; adds
  ChaisemartinDHaultfoeuille to Supported (closed-form cell-collapse
  replicate ATT, replicate + n_bootstrap > 0 rejected); Rejected list now
  enumerates WooldridgeDiD, LPDiD, SpilloverDiD, HeterogeneousAdoptionDiD
  (TSL-only, NotImplementedError at fit) and SyntheticControl (rejects
  survey_design entirely), keeping BaconDecomposition diagnostic-only

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 docs/methodology/REGISTRY.md | 12 +++++++++---
 paper.md                     | 21 ++++++++++++---------
 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 33ab7fd3..77a39a12 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -4576,7 +4576,7 @@ variance from the distribution of replicate estimates.
   design structure is fixed and dropped replicates contribute zero to the
   sum without changing the scale. Survey df uses `n_valid - 1` for
   t-based inference.
-- **Note:** Replicate-weight support matrix (12 of 15 public estimators):
+- **Note:** Replicate-weight support matrix (13 of 20 public estimators):
   - **Supported**: CallawaySantAnna (reg/ipw/dr with or without covariates,
     no bootstrap; IF-based replicate variance is covariate-agnostic),
     ContinuousDiD (no bootstrap), EfficientDiD (no bootstrap),
@@ -4587,9 +4587,15 @@ variance from the distribution of replicate estimates.
     TwoWayFixedEffects (estimator-level refit with within-transformation),
     SunAbraham (estimator-level refit, replaces `vcov_cohort`),
     StackedDiD (estimator-level refit with Q-weight composition),
-    ImputationDiD (two-stage refit), TwoStageDiD (two-stage refit)
+    ImputationDiD (two-stage refit), TwoStageDiD (two-stage refit),
+    ChaisemartinDHaultfoeuille (closed-form cell-collapse replicate ATT,
+    multi-horizon and placebo paths; replicate + `n_bootstrap > 0` rejected
+    — see the ChaisemartinDHaultfoeuille Notes for the allocator contract)
   - **Rejected with NotImplementedError**: SyntheticDiD, TROP
-    (bootstrap-based variance), BaconDecomposition (diagnostic only)
+    (bootstrap-based variance), WooldridgeDiD, LPDiD, SpilloverDiD,
+    HeterogeneousAdoptionDiD (TSL-only survey paths; replicate designs
+    rejected at `fit()`), SyntheticControl (rejects `survey_design`
+    entirely), BaconDecomposition (diagnostic only)
   - Estimators with replicate support reject replicate + bootstrap
     (replicate weights provide analytical variance)
 - **Note:** When invalid replicates are dropped in `compute_replicate_vcov`
diff --git a/paper.md b/paper.md
index 2b204824..dc7691ba 100644
--- a/paper.md
+++ b/paper.md
@@ -114,15 +114,18 @@ geographic-period panels suitable for DiD analysis.
 # Software Design
 
 Every estimator implements a common contract: a scikit-learn-style `fit()` with
-`get_params()`/`set_params()` for configuration, R-style formula support, and rich results
-dataclasses with `summary()`, `to_dict()`, and `to_dataframe()`. Numerical work is
-deliberately centralized: all estimators solve their least-squares problems and their
-robust, cluster-robust, and survey variances through a single shared linear-algebra core,
-so numerical hardening - rank-deficiency guards, degrees-of-freedom corrections,
-small-cluster behavior - lands in one place and propagates to every estimator. Inference
-fields (standard error, t-statistic, p-value, confidence interval) are always computed
-together and become NaN together when inference is not identified, rather than silently
-reporting partial results.
+`get_params()`/`set_params()` for configuration and rich results dataclasses with
+`summary()`, `to_dict()`, and `to_dataframe()`; the classic regression estimators
+additionally accept R-style formulas. Numerical work is deliberately centralized:
+estimators solve their least-squares problems through a single shared linear-algebra
+core, and analytical robust, cluster-robust, and survey variances route through one
+shared sandwich-estimator path, so numerical hardening - rank-deficiency guards,
+degrees-of-freedom corrections, small-cluster behavior - lands in one place. Estimators
+whose inference is inherently resampling-based - synthetic DiD's placebo and jackknife
+variance, for example - use method-specific variance paths. All estimators share one
+joint-inference contract: inference fields (standard error, t-statistic, p-value,
+confidence interval) are always computed together and become NaN together when inference
+is not identified, rather than silently reporting partial results.
 
 Two design choices carry the survey capability and the deployment story. First, estimators
 compute influence functions for their target parameters, so design-based variance - Taylor

From 78b00ca202e6da165d7ed7cd85cbf4e5f3ea70b2 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 3 Jul 2026 08:25:18 -0400
Subject: [PATCH 4/6] docs: drop draft-pdf CI workflow; scope
 survey-composition claims (review round 2)

- Remove .github/workflows/draft-pdf.yml: a durable CI job for the paper
  is overkill (user call). It served its purpose as a one-time compile
  check - the PDF built successfully and was visually verified; JOSS's
  editorialbot compiles the paper on demand during review.
- paper.md: survey variance no longer claimed to compose with "every
  estimator" / "uniformly" - now states per-estimator support documented
  in a compatibility matrix with unsupported combinations failing closed
- llms.txt / llms-full.txt / choosing_estimator.rst: "All estimators" ->
  "Most estimators" with SyntheticControl called out and a pointer to the
  Survey Design Support matrix (aligns with existing README wording)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 .github/workflows/draft-pdf.yml | 36 ---------------------------------
 diff_diff/guides/llms-full.txt  |  2 +-
 diff_diff/guides/llms.txt       |  2 +-
 docs/choosing_estimator.rst     |  6 +++---
 paper.md                        |  9 ++++++---
 5 files changed, 11 insertions(+), 44 deletions(-)
 delete mode 100644 .github/workflows/draft-pdf.yml

diff --git a/.github/workflows/draft-pdf.yml b/.github/workflows/draft-pdf.yml
deleted file mode 100644
index 9911561b..00000000
--- a/.github/workflows/draft-pdf.yml
+++ /dev/null
@@ -1,36 +0,0 @@
-name: Draft PDF
-
-on:
-  push:
-    branches: [main]
-    paths:
-      - paper.md
-      - paper.bib
-      - .github/workflows/draft-pdf.yml
-  pull_request:
-    paths:
-      - paper.md
-      - paper.bib
-      - .github/workflows/draft-pdf.yml
-  workflow_dispatch:
-
-permissions:
-  contents: read
-
-jobs:
-  paper:
-    runs-on: ubuntu-latest
-    name: Compile JOSS paper draft
-    steps:
-      - name: Checkout
-        uses: actions/checkout@9c091bb21b7c1c1d1991bb908d89e4e9dddfe3e0 # v7
-      - name: Build draft PDF
-        uses: openjournals/openjournals-draft-action@85a18372e48f551d8af9ddb7a747de685fbbb01c # v1.0
-        with:
-          journal: joss
-          paper-path: paper.md
-      - name: Upload paper artifact
-        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
-        with:
-          name: joss-paper
-          path: paper.pdf
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
index 58d9f6a9..281c2ef3 100644
--- a/diff_diff/guides/llms-full.txt
+++ b/diff_diff/guides/llms-full.txt
@@ -2075,7 +2075,7 @@ clear_cache()
 
 ## Survey Support
 
-All estimators accept an optional `survey_design` parameter in `fit()`. Pass a `SurveyDesign` object to get design-based variance estimation.
+Most estimators accept an optional `survey_design` parameter in `fit()` (`SyntheticControl` rejects it as not yet supported); depth of support varies by estimator - see the compatibility matrix in `docs/choosing_estimator.rst` (Survey Design Support). Pass a `SurveyDesign` object to get design-based variance estimation.
 
 ```python
 from diff_diff import SurveyDesign, CallawaySantAnna
diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt
index d584eeba..5268ce18 100644
--- a/diff_diff/guides/llms.txt
+++ b/diff_diff/guides/llms.txt
@@ -104,7 +104,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
 
 ## Survey Support
 
-All estimators accept an optional `survey_design` parameter. Pass a `SurveyDesign` object to get design-based variance estimation:
+Most estimators accept an optional `survey_design` parameter (`SyntheticControl` does not yet support it); coverage and weight types vary by estimator - see the [Survey Design Support matrix](https://diff-diff.readthedocs.io/en/stable/choosing_estimator.html#survey-design-support). Pass a `SurveyDesign` object to get design-based variance estimation:
 
 - **Design elements**: strata, PSU, FPC, weight types (pweight/fweight/aweight), lonely PSU handling, nest
 - **Variance methods**: Taylor Series Linearization (TSL), replicate weights (BRR/Fay/JK1/JKn/SDR), survey-aware bootstrap
diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst
index 2f7bdeef..93fa402d 100644
--- a/docs/choosing_estimator.rst
+++ b/docs/choosing_estimator.rst
@@ -771,10 +771,10 @@ If you're unsure which estimator to use:
 Survey Design Support
 ---------------------
 
-All estimators accept an optional ``survey_design`` parameter in ``fit()``
-(``SyntheticControl`` does not yet support it and raises ``NotImplementedError``).
+Most estimators support an optional ``survey_design`` parameter in ``fit()``
+(``SyntheticControl`` accepts the parameter but raises ``NotImplementedError``).
 Pass a :class:`~diff_diff.SurveyDesign` object to get design-based variance
-estimation. The depth of support varies by estimator:
+estimation. The depth of support varies by estimator and variance method:
 
 .. note::
 
diff --git a/paper.md b/paper.md
index dc7691ba..ca02347b 100644
--- a/paper.md
+++ b/paper.md
@@ -71,7 +71,8 @@ staggered-adoption support. General-purpose causal inference toolkits such as `D
 `diff-diff` was built as a new library, rather than as contributions to these packages,
 because its central contribution is cross-cutting: one estimator contract, one shared
 inference core, and an influence-function architecture that composes design-based survey
-variance with every estimator in the taxonomy. To our knowledge, no existing DiD software
+variance across the estimator taxonomy, with per-estimator support documented in a
+compatibility matrix and unsupported combinations failing closed. To our knowledge, no existing DiD software
 in any language provides design-based variance estimation for complex survey data, and no
 Python package covers the modern estimator taxonomy end-to-end; `diff-diff` provides
 both, validated against the R reference implementations where they exist.
@@ -130,8 +131,10 @@ is not identified, rather than silently reporting partial results.
 Two design choices carry the survey capability and the deployment story. First, estimators
 compute influence functions for their target parameters, so design-based variance - Taylor
 series linearization over strata and clusters, replicate weights, survey-aware
-bootstrap - composes uniformly with estimators as different as Callaway-Sant'Anna and
-synthetic DiD instead of requiring per-estimator derivations. Second, the runtime
+bootstrap - composes through one shared mechanism with estimators as different as
+Callaway-Sant'Anna and synthetic DiD; supported design-estimator combinations are
+documented in a per-estimator matrix, and unsupported ones are rejected explicitly rather
+than silently approximated. Second, the runtime
 dependency footprint is minimal by policy - numpy, pandas, and scipy only - keeping the
 library easy to install in restricted industry environments; high-dimensional fixed
 effects are handled by within-transformation rather than by delegating to a heavier

From c366f10ffc5d51a440684f744dc622ee631ed709 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 3 Jul 2026 08:34:34 -0400
Subject: [PATCH 5/6] docs: SDID survey matrix accuracy + split shared-core vs
 resampling survey claim (review round 3)

- choosing_estimator.rst: SyntheticDiD Strata/PSU/FPC cell "Via bootstrap"
  -> "Full (method-specific)" - full-design placebo (stratified
  permutation + weighted FW, FPC no-op) and jackknife (PSU-level LOO with
  stratum aggregation, Rust & Rao 1996) shipped alongside the Rao-Wu
  bootstrap; the "placebo/jackknife remain pweight-only ... tracked in
  TODO.md" note was stale (no TODO row exists - the work landed). Legend
  "Via bootstrap" now describes only TROP; new "Full (method-specific)"
  legend entry points at REGISTRY's survey support matrix.
- paper.md Software Design: survey-variance claim split - IF/regression
  estimators route through the shared survey-variance core (TSL +
  replicate weights); resampling estimators (SyntheticDiD, TROP) use
  documented method-specific bootstrap/placebo/jackknife paths.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 docs/choosing_estimator.rst | 28 +++++++++++++++-------------
 paper.md                    | 14 ++++++++------
 2 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/docs/choosing_estimator.rst b/docs/choosing_estimator.rst
index 93fa402d..b9abcd04 100644
--- a/docs/choosing_estimator.rst
+++ b/docs/choosing_estimator.rst
@@ -870,7 +870,7 @@ estimation. The depth of support varies by estimator and variance method:
      - Multiplier at PSU
    * - ``SyntheticDiD``
      - pweight only
-     - Via bootstrap
+     - Full (method-specific)
      - --
      - Hybrid pairs-bootstrap + Rao-Wu rescaled (bootstrap only)
    * - ``SyntheticControl``
@@ -908,24 +908,26 @@ estimation. The depth of support varies by estimator and variance method:
 
 - **Full**: All weight types (pweight/fweight/aweight) + strata/PSU/FPC + Taylor Series Linearization variance
 - **Full (pweight only)**: Full TSL with strata/PSU/FPC, but only ``pweight`` accepted (``fweight``/``aweight`` rejected because composition changes weight semantics)
-- **Via bootstrap**: Strata/PSU/FPC supported only with bootstrap variance. ``TROP`` uses bootstrap by default. ``SyntheticDiD`` supports strata/PSU/FPC on ``variance_method='bootstrap'`` via a hybrid pairs-bootstrap + Rao-Wu rescaling composition (see the ``Note (survey + bootstrap composition)`` in REGISTRY.md §SyntheticDiD); ``placebo`` and ``jackknife`` remain pweight-only.
+- **Via bootstrap**: Strata/PSU/FPC supported only with bootstrap variance (``TROP``, which uses bootstrap by default)
+- **Full (method-specific)**: ``SyntheticDiD`` supports strata/PSU/FPC on all three variance methods via method-specific survey paths — see the note below and the ``Note (survey support matrix)`` in REGISTRY.md §SyntheticDiD
 - **pweight only** (Weights column): Only ``pweight`` accepted; ``fweight``/``aweight`` raise an error
 - **Diagnostic**: Weighted descriptive statistics only (no inference)
 - **--**: Not supported
 
 .. note::
 
-   ``SyntheticDiD`` supports survey designs on ``variance_method='bootstrap'``
-   — both pweight-only and full strata/PSU/FPC — via a hybrid pairs-bootstrap
-   composed with per-draw Rao-Wu rescaled weights fed into a weighted
-   Frank-Wolfe re-estimation of ω and λ. See the
-   ``Note (survey + bootstrap composition)`` in REGISTRY.md §SyntheticDiD
-   for the objective form and argmin-set caveat.
-
-   ``variance_method='placebo'`` and ``variance_method='jackknife'`` remain
-   pweight-only — composing placebo permutations / leave-one-out with
-   Rao-Wu rescaling under the weighted objective is a separate derivation
-   (tracked in ``TODO.md``).
+   ``SyntheticDiD`` supports survey designs — both pweight-only and full
+   strata/PSU/FPC — on all three variance methods, each via a
+   method-specific path: ``bootstrap`` composes a hybrid pairs-bootstrap
+   with per-draw Rao-Wu rescaled weights fed into a weighted Frank-Wolfe
+   re-estimation of ω and λ; ``placebo`` switches to stratified
+   permutation (pseudo-treated draws within strata containing treated
+   units) with weighted-FW re-estimation, and FPC is a documented no-op
+   for the permutation test; ``jackknife`` switches to PSU-level
+   leave-one-out with stratum aggregation (Rust & Rao 1996).
+   Replicate-weight designs are rejected. See the
+   ``Note (survey support matrix)`` and the per-method composition notes
+   in REGISTRY.md §SyntheticDiD.
 
 For the full walkthrough with code examples, see the
 `survey tutorial <https://github.com/igerber/diff-diff/blob/main/docs/tutorials/16_survey_did.ipynb>`_.
diff --git a/paper.md b/paper.md
index ca02347b..95ac021a 100644
--- a/paper.md
+++ b/paper.md
@@ -128,12 +128,14 @@ joint-inference contract: inference fields (standard error, t-statistic, p-value
 confidence interval) are always computed together and become NaN together when inference
 is not identified, rather than silently reporting partial results.
 
-Two design choices carry the survey capability and the deployment story. First, estimators
-compute influence functions for their target parameters, so design-based variance - Taylor
-series linearization over strata and clusters, replicate weights, survey-aware
-bootstrap - composes through one shared mechanism with estimators as different as
-Callaway-Sant'Anna and synthetic DiD; supported design-estimator combinations are
-documented in a per-estimator matrix, and unsupported ones are rejected explicitly rather
+Two design choices carry the survey capability and the deployment story. First, the
+regression- and influence-function-based estimators compute influence functions for their
+target parameters, so design-based variance - Taylor series linearization over strata and
+clusters, and replicate weights - routes through one shared survey-variance core rather
+than requiring per-estimator derivations; resampling-based estimators such as synthetic
+DiD and TROP compose survey designs through documented method-specific bootstrap,
+placebo, and jackknife paths. Supported design-estimator combinations are listed in a
+per-estimator compatibility matrix, and unsupported ones are rejected explicitly rather
 than silently approximated. Second, the runtime
 dependency footprint is minimal by policy - numpy, pandas, and scipy only - keeping the
 library easy to install in restricted industry environments; high-dimensional fixed

From 8331fd5c66342aee1cf9f8e28bf4ab10504f568d Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Fri, 3 Jul 2026 08:44:54 -0400
Subject: [PATCH 6/6] docs(registry): Bacon out of the replicate-rejected list
 - diagnostic-only, outside the 20 count (review round 4 P3)

13 supported + 7 rejected = 20 estimators; BaconDecomposition gets its own
diagnostic-only line so the matrix cannot be read as 13-of-21.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 docs/methodology/REGISTRY.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 77a39a12..07ad7e07 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -4595,7 +4595,9 @@ variance from the distribution of replicate estimates.
     (bootstrap-based variance), WooldridgeDiD, LPDiD, SpilloverDiD,
     HeterogeneousAdoptionDiD (TSL-only survey paths; replicate designs
     rejected at `fit()`), SyntheticControl (rejects `survey_design`
-    entirely), BaconDecomposition (diagnostic only)
+    entirely)
+  - **BaconDecomposition** is diagnostic-only — outside the 20-estimator
+    count — and likewise rejects replicate designs
   - Estimators with replicate support reject replicate + bootstrap
     (replicate weights provide analytical variance)
 - **Note:** When invalid replicates are dropped in `compute_replicate_vcov`