docs(paper): finalize JOSS submission - 20 estimators, required sections, draft-pdf CI by igerber · Pull Request #604 · igerber/diff-diff

igerber · 2026-07-03T11:57:57Z

Summary

paper.md: estimator count corrected 19 -> 20 (LPDiD merged in docs(methodology): add LP-DiD (Dube et al. 2025) paper review and registry entry #568-Add LPDiD complex-survey R-parity validation vs survey::svyglm (Phase D2) #591 after the paper refresh docs(paper): refresh JOSS paper for current library and submission requirements #564); LPDiD added to the staggered-adoption list with a new Dube2025 citation
paper.md: restructured to the current JOSS required-section list - adds State of the Field (R/Stata/Python landscape + build-vs-contribute justification), Software Design (shared inference core, influence-function survey architecture, minimal-dependency policy, optional Rust backend), and Research Impact Statement (companion preprint, golden-file R validation, NHANES/RECS/API survey validation, community-readiness signals). Now 1,269 words, within JOSS's 750-1750 target
paper.bib: adds Dube2025 (LP-DiD, J. Applied Econometrics 40(5), doi 10.1002/jae.70000), Binder1983 (was cited in-text with no bib entry), and pyfixest (software citation); all 27 entries cited, no orphans, pandoc --citeproc compiles clean
AI disclosure: model families updated to Opus, Sonnet, and Fable
llms.txt / llms-full.txt: stale "19 estimators" -> 20; replicate-weight support "13 of 16" -> "13 of 20"
practitioner_decision_tree.rst: stale "17 estimators" -> 20, adds Local Projections DiD to the list
choosing_estimator.rst survey matrix: dCDH Replicate Weights cell was stale ("--" despite shipped support; verified via tests/test_survey_dcdh_replicate_psu.py, 6 tests pass incl. JK1-converges-to-TSL); adds missing SpilloverDiD row (pweight, Binder TSL + Conley) and SyntheticControl row (not yet supported); intro notes the SyntheticControl NotImplementedError
.github/workflows/draft-pdf.yml: new SHA-pinned workflow compiling the JOSS paper via openjournals-draft-action on paper.md/paper.bib changes, uploading the PDF as an artifact (docker unavailable locally; this PR's run doubles as the compile check)

Methodology references (required if estimator / math changes)

Method name(s): N/A - no estimator/math changes (documentation, paper, and CI workflow only)
Paper / source link(s): Dube, Girardi, Jorda & Taylor (2025) J. Applied Econometrics 40(5) 741-758 (citation added, matching docs/references.rst); Binder (1983) Int. Statistical Review 51(3) 279-292
Any intentional deviations from the source (and why): None

Validation

Tests added/updated: No test changes. Verified locally: tests/test_guides.py (47 passed), tests/test_openai_review.py -m '' (250 passed, workflow contract/SHA-pin checks), tests/test_survey_dcdh_replicate_psu.py subset (6 passed, confirms the matrix cell fix)
Backtest / simulation / notebook evidence (if applicable): pandoc --citeproc smoke test on paper.md/paper.bib passes; citation cross-check script confirms 27/27 keys resolve with no orphans

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…ons, draft-pdf CI - paper.md: update estimator count 19 -> 20 (LPDiD merged after #564); add LPDiD to the staggered-adoption method list with Dube2025 citation - paper.md: restructure to current JOSS required sections - add State of the Field (Python/R/Stata landscape with build-vs-contribute justification), Software Design (shared inference core, influence-function survey architecture, minimal-dependency policy, Rust backend), and Research Impact Statement (companion preprint, golden-file R validation, real-data survey validation, community-readiness signals); 1,269 words, within the 750-1750 target - paper.bib: add Dube2025 (LP-DiD, J. Applied Econometrics), Binder1983 (previously cited in-text without a bib entry), pyfixest (software citation for State of the Field) - paper.md: AI disclosure updated to name the Opus, Sonnet, and Fable model families - llms.txt / llms-full.txt: stale counts 19 -> 20 and "13 of 16" -> "13 of 20" replicate-weight support - practitioner_decision_tree.rst: stale "17 estimators" -> 20, mention Local Projections DiD - choosing_estimator.rst survey matrix: dCDH replicate weights "--" -> "Full (analytical)" (support landed with test_survey_dcdh_replicate_psu coverage); add missing SpilloverDiD and SyntheticControl rows; note the SyntheticControl NotImplementedError in the intro - .github/workflows/draft-pdf.yml: SHA-pinned openjournals draft action compiles the paper on paper.md/paper.bib changes and uploads the PDF artifact Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

California's (Abadie2010) and pyfixest/Python (pyfixest entry) were being lowercased by CSL sentence-casing in the compiled PDF. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-07-03T12:04:44Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e60e813f96b727d5c34f3e8fb18654e81e00a9f0

Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings. The PR is documentation/paper/CI only; I found two P2 documentation-methodology accuracy issues worth fixing, but no blocker for approval.

Executive Summary

No estimator, weighting, variance, or default-behavior code changes.
The dCDH replicate-weight table update appears consistent with the implementation.
paper.bib citation coverage checks out: 27/27 bibliography keys are cited.
The Methodology Registry is now stale relative to the new “13 of 20” replicate-weight claim.
The new JOSS Software Design text overstates how uniformly formula support and variance computation are shared across all estimators.
I could not run the test suite or pandoc smoke test because pytest and pandoc are unavailable in this environment.

Methodology

Finding 1

Severity: P2
Impact: paper.md:L116-L131 says every estimator has R-style formula support and that all robust/cluster/survey variances route through one shared linear-algebra core. That overstates the actual contract: formula parsing is exposed on the base DiD family (diff_diff/estimators.py:L241-L326), while estimators like SyntheticControl use column-name-only APIs (diff_diff/synthetic_control.py:L274-L328). SyntheticDiD explicitly uses bootstrap/jackknife/placebo variance rather than the analytical sandwich core (diff_diff/synthetic_did.py:L193-L214, diff_diff/synthetic_did.py:L1148-L1191).
Concrete fix: Rephrase the paper to distinguish “regression-style estimators use the shared linear algebra/vcov core” from “IF/GMM/bootstrap estimators use method-specific variance paths documented in REGISTRY, with shared safe inference where applicable.”

Finding 2

Severity: P2
Impact: The changed guide now says replicate-weight variance covers “13 of 20 estimators” (diff_diff/guides/llms-full.txt:L2121-L2124), but the Methodology Registry still says “12 of 15 public estimators” and omits the newer support/unsupported set (docs/methodology/REGISTRY.md:L4579-L4592). Since the registry is the review source of truth, this creates conflicting methodology documentation.
Concrete fix: Update the registry matrix to “13 of 20,” add dCDH to supported replicate-weight estimators, and list the unsupported current estimators explicitly: SyntheticDiD, TROP, SyntheticControl, WooldridgeDiD, LPDiD, SpilloverDiD, and HeterogeneousAdoptionDiD as appropriate, with Bacon kept diagnostic-only.

Code Quality

No findings. No Python implementation code changed.

Performance

No findings. The new workflow only compiles the paper on relevant path changes.

Maintainability

No findings. The new workflow is small, path-scoped, and SHA-pinned (.github/workflows/draft-pdf.yml:L3-L18, .github/workflows/draft-pdf.yml:L25-L36).

Tech Debt

No blocking findings. Existing deferred survey limitations for LPDiD, SpilloverDiD, HAD, and SyntheticControl are already tracked in TODO.md and/or REGISTRY; the only new concern is the stale registry count noted above.

Security

No findings. The workflow uses pull_request, not pull_request_target, grants only contents: read, and uploads only paper.pdf as an artifact (.github/workflows/draft-pdf.yml:L10-L18, .github/workflows/draft-pdf.yml:L32-L36).

Documentation/Tests

Finding

Severity: P2
Impact: Documentation is internally inconsistent because the JOSS architecture prose and registry support matrix lag the current estimator/survey surface. This is not a statistical-output bug, but it weakens the methodology-source contract reviewers are instructed to use.
Concrete fix: Apply the two documentation fixes above, then rerun the paper compile and guide tests in CI. Locally, I verified bibliography key coverage by regex, but could not run pytest or pandoc because neither command is installed.

…Y replicate matrix 13/20 - paper.md Software Design: formula support scoped to the classic regression estimators; least-squares solves centralized but analytical vs resampling-based variance paths distinguished (synthetic DiD placebo/ jackknife noted); joint-inference NaN contract stated as the invariant all estimators share - REGISTRY.md replicate-weight support matrix: 12 of 15 -> 13 of 20; adds ChaisemartinDHaultfoeuille to Supported (closed-form cell-collapse replicate ATT, replicate + n_bootstrap > 0 rejected); Rejected list now enumerates WooldridgeDiD, LPDiD, SpilloverDiD, HeterogeneousAdoptionDiD (TSL-only, NotImplementedError at fit) and SyntheticControl (rejects survey_design entirely), keeping BaconDecomposition diagnostic-only Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-07-03T12:23:23Z

🔁 AI review rerun (requested by @igerber)

Head SHA: c793a563ca3f78c17727d653dcea75aca8ac10ab

Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings. This PR is documentation/paper/CI only; the previous registry-count issue is resolved, and the remaining issues are P2 documentation-scope accuracy problems.

Executive Summary

No estimator, weighting, variance, SE, or default-behavior code changed.
REGISTRY.md now documents the 13-of-20 replicate-weight support matrix and dCDH support.
The new PDF workflow is path-scoped, SHA-pinned, and uses read-only contents permission.
Residual P2: paper.md still overstates how uniformly survey variance composes across every estimator.
Residual P2: survey docs/guides still use “All estimators” wording despite documented unsupported/rejected paths.
I could not run pytest or pandoc; both commands are unavailable in this environment.

Methodology

Finding 1

Severity: P2
Impact: paper.md:L71-L77 and paper.md:L130-L134 still say design-based survey variance composes with “every estimator in the taxonomy” / “uniformly” through influence functions. That is broader than the registry and code: replicate support is documented as 13 of 20, with SyntheticDiD/TROP/Wooldridge/LPDiD/Spillover/HAD/SyntheticControl rejected or scoped differently in docs/methodology/REGISTRY.md:L4579-L4598; SyntheticDiD explicitly rejects replicate designs in diff_diff/synthetic_did.py:L356-L362; SyntheticControl rejects survey_design entirely in diff_diff/synthetic_control.py:L327-L337.
Concrete fix: Rephrase to “survey variance composes with supported IF/GMM/regression estimators and variance modes listed in the registry; unsupported surfaces fail closed, and resampling estimators use method-specific survey paths.”

Finding 2

Severity: P2
Impact: docs/choosing_estimator.rst:L774-L777, diff_diff/guides/llms-full.txt:L2076-L2078, and diff_diff/guides/llms.txt:L105-L108 still say “All estimators” accept/pass survey_design for design-based variance, while the same PR documents unsupported paths in the registry and notes SyntheticControl raises. This is not a silent correctness bug because code fails closed, but it is user-facing methodology/support drift.
Concrete fix: Change to “Most estimators accept survey_design; support varies by estimator and variance method,” explicitly call out SyntheticControl and the matrix, then regenerate bundled guides.

Code Quality

No findings. No Python implementation code changed.

Performance

No findings. The workflow only runs on paper/workflow path changes.

Maintainability

No findings. The registry update makes the replicate-weight support matrix more explicit.

Tech Debt

No blocking findings. Unsupported survey/replicate paths for LPDiD, SpilloverDiD, HAD, SyntheticControl, and related estimators are tracked in TODO.md and/or REGISTRY.md.

Security

No findings. .github/workflows/draft-pdf.yml:L10-L18 uses pull_request with contents: read, and action references are SHA-pinned at .github/workflows/draft-pdf.yml:L25-L33.

Documentation/Tests

P2 documentation-scope findings are listed above. paper.bib’s new keys are cited from paper.md, but I could not run the paper compile or test suite because pandoc and pytest are not installed.

…view round 2) - Remove .github/workflows/draft-pdf.yml: a durable CI job for the paper is overkill (user call). It served its purpose as a one-time compile check - the PDF built successfully and was visually verified; JOSS's editorialbot compiles the paper on demand during review. - paper.md: survey variance no longer claimed to compose with "every estimator" / "uniformly" - now states per-estimator support documented in a compatibility matrix with unsupported combinations failing closed - llms.txt / llms-full.txt / choosing_estimator.rst: "All estimators" -> "Most estimators" with SyntheticControl called out and a pointer to the Survey Design Support matrix (aligns with existing README wording) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-07-03T12:27:40Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 78b00ca202e6da165d7ed7cd85cbf4e5f3ea70b2

Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings. The remaining issues are P2 documentation-scope mismatches; no estimator code, variance code, defaults, or inference computation changed.

Executive Summary

No Python estimator/math/variance implementation changed in the provided diff.
The prior “All estimators accept survey_design” wording is mostly fixed.
The registry now documents 13-of-20 replicate-weight support and dCDH support.
Remaining P2: docs/choosing_estimator.rst now understates SyntheticDiD full-design survey support for placebo/jackknife.
Remaining P2: paper.md still blends IF-based survey variance with SyntheticDiD’s method-specific resampling paths.

Methodology

Finding 1

Severity: P2
Impact: docs/choosing_estimator.rst:L871-L875 and docs/choosing_estimator.rst:L911-L927 say SyntheticDiD supports strata/PSU/FPC “Via bootstrap” and that placebo/jackknife “remain pweight-only.” That contradicts the registry and implementation: docs/methodology/REGISTRY.md:L2032-L2049 documents full strata/PSU/FPC support for SyntheticDiD bootstrap, placebo, and jackknife; diff_diff/synthetic_did.py:L452-L459 and diff_diff/synthetic_did.py:L900-L915 route full-design placebo and jackknife to dedicated survey paths. This is not a silent correctness bug because the code supports these paths, but it is user-facing methodology/support drift.
Concrete fix: Change the SyntheticDiD Strata/PSU/FPC cell to something like “Full (method-specific)” and revise the legend/note to state: bootstrap uses weighted FW + Rao-Wu, placebo uses stratified permutation + weighted FW with FPC no-op, jackknife uses PSU-level LOO with stratum aggregation, and replicate weights remain unsupported.

Finding 2

Severity: P2
Impact: paper.md:L131-L136 says estimators compute influence functions so survey variance, including bootstrap, “composes through one shared mechanism” with estimators “as different as Callaway-Sant'Anna and synthetic DiD.” This still overstates the architecture for SyntheticDiD: the registry documents SyntheticDiD’s survey support as method-specific bootstrap/placebo/jackknife resampling, while replicate-weight IF variance is rejected; the paper itself separately says SyntheticDiD placebo/jackknife use method-specific variance paths at paper.md:L125-L126.
Concrete fix: Split the claim: “IF/GMM/regression estimators route through the shared survey-variance core; resampling estimators such as SyntheticDiD and TROP use documented method-specific survey bootstrap/resampling paths.”

Code Quality

No findings. No Python implementation code changed.

Performance

No findings. The provided diff changes documentation/paper/bibliography only.

Maintainability

No findings. The registry count update improves maintainability of the survey support matrix.

Tech Debt

No blocking findings. Unsupported replicate/survey combinations are documented in REGISTRY.md and/or tracked in TODO.md.

Security

No findings. I did not see secrets or credentials in the changed docs/bibliography content.

Documentation/Tests

Documentation findings are listed above. I did not run pytest or pandoc in this review; assessment is based on the provided diff plus registry/code cross-checks.

…urvey claim (review round 3) - choosing_estimator.rst: SyntheticDiD Strata/PSU/FPC cell "Via bootstrap" -> "Full (method-specific)" - full-design placebo (stratified permutation + weighted FW, FPC no-op) and jackknife (PSU-level LOO with stratum aggregation, Rust & Rao 1996) shipped alongside the Rao-Wu bootstrap; the "placebo/jackknife remain pweight-only ... tracked in TODO.md" note was stale (no TODO row exists - the work landed). Legend "Via bootstrap" now describes only TROP; new "Full (method-specific)" legend entry points at REGISTRY's survey support matrix. - paper.md Software Design: survey-variance claim split - IF/regression estimators route through the shared survey-variance core (TSL + replicate weights); resampling estimators (SyntheticDiD, TROP) use documented method-specific bootstrap/placebo/jackknife paths. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-07-03T12:36:29Z

🔁 AI review rerun (requested by @igerber)

Head SHA: c366f10ffc5d51a440684f744dc622ee631ed709

Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings.

Executive Summary

Prior SyntheticDiD survey-support findings are resolved: the matrix now documents full-design method-specific support for bootstrap, placebo, and jackknife, matching REGISTRY.md and synthetic_did.py.
The paper’s prior shared-inference overclaim is addressed by separating IF/regression estimators from resampling-based estimators.
dCDH replicate-weight support is now documented as supported, matching the registry/code contract.
No estimator implementation, math, variance, SE, assumptions, or defaults changed in this diff.
One P3 documentation wording nit remains around estimator/support counts and BaconDecomposition.

Methodology

No findings. The changed SyntheticDiD documentation in docs/choosing_estimator.rst:L871-L930 matches the registry’s survey support matrix in docs/methodology/REGISTRY.md:L2032-L2048 and the implementation/docstring contract in diff_diff/synthetic_did.py.

Code Quality

No findings. No Python implementation code changed.

Performance

No findings. The diff is documentation/bibliography only.

Maintainability

No findings.

Tech Debt

No blocking findings. Deferred unsupported survey/replicate combinations remain documented in the registry.

Security

No findings. I did not see secrets or credential material in the changed files.

Documentation/Tests

Finding 1

Severity: P3
Impact: docs/methodology/REGISTRY.md:L4579-L4598 says “13 of 20 public estimators” but the rejected list also includes BaconDecomposition, while paper.md:L95-L97 classifies Goodman-Bacon as diagnostic tooling outside the 20 estimators. The support claim is effectively correct if Bacon is excluded, but the wording can read as 13 supported out of 21 public classes.
Concrete fix: Reword to “13 of 20 estimators; BaconDecomposition is diagnostic-only and rejects replicate designs” or move Bacon to a separate diagnostic note.

I did not run pytest or pandoc; this review is based on the supplied diff plus local registry/source cross-checks.

…-only, outside the 20 count (review round 4 P3) 13 supported + 7 rejected = 20 estimators; BaconDecomposition gets its own diagnostic-only line so the matrix cannot be read as 13-of-21. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

github-actions · 2026-07-03T12:47:02Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 8331fd5c66342aee1cf9f8e28bf4ab10504f568d

Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings.

Executive Summary

No estimator, math, weighting, variance/SE, identification, or default-behavior code changed in this diff.
The prior 13 of 20 / BaconDecomposition wording issue is resolved: REGISTRY.md now separates Bacon as diagnostic-only.
The updated survey support matrix matches the checked source contracts for SyntheticControl, SyntheticDiD, dCDH, LPDiD, and SpilloverDiD.
The new LPDiD, Binder, pyfixest, estimator-count, and survey-support documentation updates are internally consistent with the registry/source material reviewed.
I did not run pytest, pandoc, or notebook builds; this review is based on diff inspection plus local registry/source cross-checks.

Methodology

No findings.

Affected methodology documentation only. The changed rows in docs/choosing_estimator.rst:L774-L930 match the registry and source contracts:

SyntheticControl rejects survey_design explicitly at diff_diff/synthetic_control.py:L327-L338.
SyntheticDiD rejects replicate-weight designs and documents method-specific full-design survey paths at diff_diff/synthetic_did.py:L298-L362 and diff_diff/synthetic_did.py:L430-L459.
dCDH replicate-weight analytical support and replicate+bootstrap rejection are documented and gated at diff_diff/chaisemartin_dhaultfoeuille.py:L1205-L1219 and diff_diff/chaisemartin_dhaultfoeuille.py:L2804-L2824.
SpilloverDiD pweight-only survey support and replicate rejection match diff_diff/spillover.py:L2382-L2407.

Code Quality

No findings. No Python implementation code changed.

Performance

No findings. The diff is documentation/bibliography only.

Maintainability

No findings. The prior registry count ambiguity is improved at docs/methodology/REGISTRY.md:L4579-L4600.

Tech Debt

No findings. Deferred unsupported survey/replicate combinations remain documented in TODO.md and/or registry notes.

Security

No findings. I did not see secrets or credential material in the changed files.

Documentation/Tests

No blocking findings. Documentation updates are consistent with the reviewed source/registry material. I did not independently run the claimed test or pandoc validation.

igerber and others added 2 commits July 3, 2026 07:57

docs(paper): brace-protect proper nouns in paper.bib titles

e60e813

California's (Abadie2010) and pyfixest/Python (pyfixest entry) were being lowercased by CSL sentence-casing in the compiled PDF. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Jul 3, 2026

igerber merged commit 1971529 into main Jul 3, 2026
33 of 34 checks passed

igerber deleted the joss-paper-finalize branch July 3, 2026 14:29

igerber mentioned this pull request Jul 3, 2026

Bump version to 3.6.2 #609

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(paper): finalize JOSS submission - 20 estimators, required sections, draft-pdf CI#604

docs(paper): finalize JOSS submission - 20 estimators, required sections, draft-pdf CI#604
igerber merged 6 commits into
mainfrom
joss-paper-finalize

igerber commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jul 3, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jul 3, 2026

Overall Assessment

Executive Summary

Methodology

Finding 1

Finding 2

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Finding

Uh oh!

github-actions Bot commented Jul 3, 2026

Overall Assessment

Executive Summary

Methodology

Finding 1

Finding 2

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

github-actions Bot commented Jul 3, 2026

Overall Assessment

Executive Summary

Methodology

Finding 1

Finding 2

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

github-actions Bot commented Jul 3, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Finding 1

Uh oh!

github-actions Bot commented Jul 3, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant