Add LPDiD complex-survey-design support (Phase D1) by igerber · Pull Request #590 · igerber/diff-diff

igerber · 2026-06-30T13:08:37Z

Summary

Adds complex-survey-design support to LPDiD (Phase D, final phase of the LP-DiD salvage initiative). A survey_design= argument on LPDiD.fit() (a SurveyDesign with probability weights + optional strata/PSU/FPC), matching the library-wide fit()-time convention (CallawaySantAnna / SpilloverDiD / ...).
On the variance-weighted default path, each horizon's long-difference regression is fit by WLS on the survey weights; the SE is the stratified-PSU Taylor-linearization (Binder 1983 TSL) sandwich meat = Σ_h (1−f_h)·(n_h/(n_h−1))·Σ_j (S_hj−S̄_h)(S_hj−S̄_h)' with df = n_PSU − n_strata, reusing diff_diff/survey.py (compute_survey_vcov / _compute_stratified_psu_meat). No survey.py change.
The design is re-resolved on each realized (post-clean-control) sample so weights/strata/PSU align with the regression rows; with no explicit PSU the unit (LP-DiD's default cluster) is injected as the PSU. Fails closed to NaN on under-identified samples.
Rejects survey_design with reweight=True (the equally-weighted / regression-adjustment influence-function path), replicate-weight designs, and non-pweight (fweight/aweight) types — each a deferred follow-up.
LPDiDResults gains survey_metadata / n_strata / n_psu, a "survey_tsl" vcov_type, and a Survey Design block in summary(). The non-survey path is byte-for-byte unchanged (gated on survey_design is None); all prior LPDiD tests pass identically.

Methodology references (required if estimator / math changes)

Method name(s): LPDiD (Dube, Girardi, Jordà & Taylor 2025) + complex-survey Taylor-linearization variance (Binder 1983; Lumley 2004)
Paper / source link(s): see docs/methodology/REGISTRY.md §LPDiD (Note Add Synthetic Difference-in-Differences (SDID) estimator #8, new) and §Survey Data Support
Any intentional deviations from the source (and why): Survey support is scoped to the variance-weighted default path (the only path with a runnable survey::svyglm reference); the reweighted/RA influence-function variance, replicate weights, and non-pweight types are rejected and documented as deferred follow-ups (REGISTRY Note Add Synthetic Difference-in-Differences (SDID) estimator #8 + TODO). The paper specifies no SE formula; the survey SE is an implementation choice validated against survey::svyglm, documented under Deviations.

Validation

Tests added/updated: tests/test_lpdid.py — new TestLPDiDSurvey (15 pure-Python invariant tests: reduction/unit-clustering, FPC-shrinks-SE, stratification, lonely-PSU, NaN-consistency, weighting-moves-point, metadata + fit-idempotence, and rejection paths). All prior LPDiD tests unchanged (non-survey path bit-identical). Green on both pure-Python and Rust backends.
Backtest / simulation / notebook evidence (if applicable): the compute_survey_vcov TSL sandwich was validated to match survey::svyglm on the stacked long-difference design (point/SE/df) across full strata+PSU+FPC, no-FPC, and weights-only-with-injection variants. The committed numeric svyglm golden + parity test is the D2 follow-up.

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

Adds a `survey_design=` argument to `LPDiD.fit()` (a `SurveyDesign` with probability weights + optional strata/PSU/FPC), matching the library-wide fit()-time convention. On the variance-weighted default path each horizon's long-difference regression is fit by WLS on the survey weights, and the SE is the stratified-PSU Taylor-linearization (Binder TSL) sandwich with `df = n_PSU - n_strata`, reusing `diff_diff/survey.py` (`compute_survey_vcov`). The design is re-resolved on each realized (post-clean-control) sample so weights/strata/PSU align with the regression rows; with no explicit PSU the unit is injected as the PSU. Fails closed to NaN on under-identified samples. Rejects `survey_design` with `reweight=True` (the equally-weighted / regression-adjustment IF path), replicate-weight designs, and non-pweight types (deferred follow-ups). `LPDiDResults` gains `survey_metadata` / `n_strata` / `n_psu`, a `"survey_tsl"` vcov_type, and a Survey Design block in `summary()`. The non-survey path is byte-for-byte unchanged. Validated against `survey::svyglm` on the stacked long difference (numeric golden parity is the D2 follow-up); 15 new pure-Python invariant tests (reduction/unit-clustering, FPC-shrinks-SE, stratification, lonely-PSU, NaN-consistency, weighting-moves-point, metadata, rejection paths). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-30T13:12:15Z

Overall Assessment
✅ Looks good. No unmitigated P0/P1 methodology or inference correctness issues found.

Executive Summary

Affected method: LPDiD, specifically the new fit(survey_design=...) variance-weighted survey path and LPDiDResults reporting.
The survey methodology scope is documented in docs/methodology/REGISTRY.md:L1873-L1892; rejected reweight, replicate-weight, and non-pweight paths are also tracked in TODO.md:L121.
The new survey inference path uses safe_inference() consistently; I did not find new inline inference or partial NaN-guard anti-patterns.
One P2 metadata issue: event-only survey fits can report the wrong headline G when an explicit survey PSU differs from the LPDiD cluster.
I could not run tests locally because this environment lacks pytest and numpy.

Methodology

Severity: P3 informational
Impact: The D1 survey path is a documented library extension: WLS point estimates with Binder TSL survey SEs on the variance-weighted path, with PR-D2 R parity tracked separately. This is mitigated by REGISTRY.md and TODO.md.
Concrete fix: No D1 blocker. Complete the tracked PR-D2 survey::svyglm golden parity work in TODO.md:L73.

Code Quality

Severity: P2
Finding: For only_event=True survey fits, headline_n_clusters falls back to panel["_cluster"].nunique() even when survey_design.psu is the effective variance cluster. The summary then labels G under cluster_name=survey_cluster_name, so an explicit PSU design with n_psu != n_units can display the unit count as PSU count. See diff_diff/lpdid.py:L1501-L1524 and diff_diff/lpdid_results.py:L227-L234.
Impact: Metadata/reporting can be misleading for event-only survey results. Per-row event-study n_clusters and statistical inference are still computed on the realized survey design, so this is not a silent wrong SE/p-value issue.
Concrete fix: When survey_design is not None and pooled is None, set headline_n_clusters from the effective survey design count, or from an appropriate event-study row’s n_clusters. Add a test with only_event=True, SurveyDesign(psu="psu"), and n_psu != n_units.

Performance

Severity: None
Impact: No performance regression requiring action found in the changed paths.
Concrete fix: None.

Maintainability

Severity: P3
Impact: The survey path is threaded locally through many helper calls, but the pattern is consistent and covered by tests. The event-only G fallback above is the only maintainability concern.
Concrete fix: Cover the survey only_event metadata branch with a focused regression test.

Tech Debt

Severity: P3 informational
Impact: R parity for LPDiD survey SEs is intentionally deferred and tracked in TODO.md:L73; survey scope gaps are tracked in TODO.md:L121.
Concrete fix: No D1 approval blocker.

Security

Severity: None
Impact: No secrets, PII, or security-sensitive changes found.
Concrete fix: None.

Documentation/Tests

Severity: P3
Impact: Docs and registry were updated for the new survey scope. Test coverage is strong for reduction, FPC, stratification, NaN consistency, metadata, idempotence, and rejection paths, but it misses the event-only explicit-PSU metadata case.
Concrete fix: Add the only_event=True explicit-PSU metadata test described above. Local tests were not run because required dependencies are missing in this environment.

…ey fits CI-codex P2: under a survey design the effective variance cluster is the PSU (cluster_name reports the PSU column), but for only_event=True fits (pooled is None) headline_n_clusters fell back to the panel unit count -- so an explicit PSU design with n_psu != n_units could display the unit count mislabeled as G=<psu>. Per-row event-study n_clusters and inference were already computed on the realized survey design, so this was a metadata/labeling issue only, not a wrong SE/p-value. Fix: when a survey design is active, seed headline_n_clusters from the panel-level effective PSU count (the pooled-post override still prefers the realized survey-sample count when available). Regression test added (only_event=True, explicit PSU, n_psu != n_units). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-30T13:19:11Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 89738b53c226149e696c441bc5dc797c2d063286

Overall Assessment

⚠️ Needs changes. One newly identified P1 remains in the new LPDiD survey variance path.

Executive Summary

Affected method: LPDiD.fit(survey_design=...), specifically the variance-weighted WLS + Binder TSL path.
The prior event-only explicit-PSU metadata issue is addressed via headline_n_clusters = survey_n_psu and a focused regression test at tests/test_lpdid.py:L1867-L1881.
The documented D1 scope and deferrals are properly recorded in docs/methodology/REGISTRY.md:L1873-L1892 and TODO.md:L73, TODO.md:L121.
P1 [Newly identified]: rank-deficient nuisance columns on the survey path can make otherwise identified treatment SE/inference collapse to NaN because the Binder sandwich is computed on the unreduced full design.
I could not run tests locally: pytest is unavailable, and a direct Python repro attempt also failed because numpy is not installed.

Methodology

Severity: P1 [Newly identified]
Impact: The survey path does not preserve the library’s rank-deficient nuisance-column contract. solve_ols documents that dropped collinear coefficients are set to NaN while residuals are computed from the identified reduced design (diff_diff/linalg.py:L798-L810, diff_diff/linalg.py:L1229-L1288, diff_diff/linalg.py:L1022-L1037). But _estimate_survey_sample() discards the returned residuals, recomputes response - design @ coef, and passes the unreduced design into compute_survey_vcov() (diff_diff/lpdid.py:L939-L955). If a redundant direct-inclusion covariate, absorbed dummy, or lag is dropped while treatment_entry remains identified, the full-design sandwich uses NaN coefficients and/or a singular X'WX bread (diff_diff/survey.py:L2085-L2097), yielding NaN SE/t/p/CI for an otherwise identified survey WLS effect. This is not documented as a D1 limitation.
Concrete fix: In _estimate_survey_sample(), keep the residuals returned by solve_ols; derive kept = np.isfinite(coef); if treatment index 1 is dropped, return NaN inference; otherwise compute compute_survey_vcov(design[:, kept], residuals, resolved) and map the reduced treatment index back to the original coefficient. Keep rank_deficient_action="error" raising from solve_ols.
Severity: P3 informational
Impact: The survey design extension itself is documented as a library extension: WLS point estimates plus Binder TSL SEs, scoped to reweight=False, with replicate weights and non-pweights deferred (docs/methodology/REGISTRY.md:L1873-L1892, TODO.md:L121). These are not defects under the review rules.
Concrete fix: None for D1; complete the tracked PR-D2 survey::svyglm parity work in TODO.md:L73.

Code Quality

Severity: P2
Impact: fit() calls _survey_columns(survey_design) before _resolve_survey_for_fit() performs the SurveyDesign type check (diff_diff/lpdid.py:L95-L105, diff_diff/lpdid.py:L1355-L1362). Passing a non-SurveyDesign object can raise an incidental AttributeError instead of the intended TypeError.
Concrete fix: Move the _resolve_survey_for_fit() type validation before _survey_columns(), or make _survey_columns() type-check and raise the same public error.

Performance

Severity: None
Impact: No performance regression requiring action found in the changed paths.
Concrete fix: None.

Maintainability

Severity: P2
Impact: The new survey branch partially duplicates rank-deficient residual/bread handling already solved in solve_ols, which caused the P1 above.
Concrete fix: Centralize the kept-column mapping in _estimate_survey_sample() or add a small helper for “OLS coefficients + residuals + kept columns” so survey sandwiches follow the same contract.

Tech Debt

Severity: P3 informational
Impact: R parity and unsupported survey combinations are tracked in TODO.md:L73 and TODO.md:L121, so they are not approval blockers.
Concrete fix: No D1 action required beyond the P1 fix.

Security

Severity: None
Impact: No secrets, PII, or security-sensitive changes found.
Concrete fix: None.

Documentation/Tests

Severity: P1
Impact: Existing survey covariate coverage only checks a non-redundant direct-inclusion covariate (tests/test_lpdid.py:L1819-L1830), so the rank-deficient survey WLS/TSL case above is untested.
Concrete fix: Add a survey regression test with covariates=["x", "xdup", "xc"], rank_deficient_action="silent", and survey_design=_full_design(), asserting the treatment SE/inference remain finite when the non-redundant reference fit is finite. Also assert rank_deficient_action="error" raises.

Path To Approval

Fix _estimate_survey_sample() to compute the survey vcov on the kept-column design and returned residuals from solve_ols.
Add the redundant-covariate survey regression test described above.
Optionally move the survey_design type validation before _survey_columns() to clean up the public error surface.

…icient contract) CI-codex P1: `_estimate_survey_sample` computed the Binder TSL sandwich on the UNREDUCED design and recomputed `response - design @ coef`. When the rank handler drops a redundant direct-inclusion covariate / absorbed dummy / lag (setting that coef to NaN while `treatment_entry` stays identified), the NaN coef propagated through the residuals and the full-design X'WX bread singularized, collapsing an otherwise-identified treatment SE/t/p/CI to NaN -- violating the library's rank-deficient contract that the non-survey solve_ols path honors. Fix: keep solve_ols's returned residuals (original-scale, computed on the identified reduced design) and build `compute_survey_vcov` on `design[:, kept]` where `kept = isfinite(coef)`, mapping treatment back to its kept-column index. If treatment itself is dropped, the effect is NaN and the SE stays NaN; `rank_deficient_action="error"` still raises from solve_ols. Regression test (duplicate + constant covariate, `silent`) asserts the treatment SE stays finite and equals the non-redundant reference fit, and that `error` raises. CI-codex P2: type-check `survey_design` before `_survey_columns` accesses its attributes, so a non-SurveyDesign argument raises the intended TypeError rather than an incidental AttributeError (test added). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-30T13:29:20Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e95d7afa79430dde692ef0fadedc4793207c5545

Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings found in the re-review.

Executive Summary

Affected method: LPDiD.fit(survey_design=...), specifically the variance-weighted WLS + Binder TSL survey path.
The prior P1 rank-deficient nuisance-column issue is addressed: the survey sandwich now uses solve_ols() residuals and the kept-column design.
The prior non-SurveyDesign public error issue is also addressed with an upfront type check.
Survey scope limitations are documented in REGISTRY.md and TODO.md, so they are informational under the review rules.
I could not run tests locally: pytest is not installed, and importing the package fails because numpy is unavailable.

Methodology

Severity: P3 informational
Impact: The LPDiD survey extension is documented as a library extension: WLS point estimates with stratified-PSU Binder TSL SEs, scoped to reweight=False; reweight=True, replicate weights, and non-pweights are rejected and tracked. This matches docs/methodology/REGISTRY.md:L1873-L1892 and TODO.md:L73, TODO.md:L121.
Concrete fix: None required for D1.
Severity: None
Impact: The previous rank-deficiency methodology issue is resolved. _estimate_survey_sample() now computes coefficients/residuals via solve_ols(), derives kept = np.isfinite(coef), and calls compute_survey_vcov(design[:, kept], residuals, resolved), preserving the documented solve_ols() rank-deficient contract. See diff_diff/lpdid.py:L939-L977; regression coverage is in tests/test_lpdid.py:L1831-L1857.
Concrete fix: None.

Code Quality

Severity: None
Impact: The previous incidental AttributeError surface is fixed: fit() now type-checks survey_design before _survey_columns() accesses its attributes. See diff_diff/lpdid.py:L1371-L1376 and tests/test_lpdid.py:L1859-L1863.
Concrete fix: None.

Performance

Severity: None
Impact: No performance regression requiring action found in the changed paths. Per-sample survey re-resolution is intentional and documented.
Concrete fix: None.

Maintainability

Severity: None
Impact: The new survey branch now follows the existing rank-deficient OLS contract instead of duplicating full-design residual/bread handling.
Concrete fix: None.

Tech Debt

Severity: P3 informational
Impact: D1 defers numeric survey::svyglm parity and unsupported LPDiD survey combinations, but both are tracked in TODO.md:L73 and TODO.md:L121.
Concrete fix: None for this PR.

Security

Severity: None
Impact: No secrets, PII, or security-sensitive changes found.
Concrete fix: None.

Documentation/Tests

Severity: P3 informational
Impact: Documentation and tests cover the new survey behavior, including FPC, stratification, weighting, metadata, rejection paths, NaN-consistency, idempotence, and the prior rank-deficient survey regression. See tests/test_lpdid.py:L1768-L1980. I could not execute them in this sandbox because pytest and numpy are unavailable.
Concrete fix: None required in the PR; run the focused LPDiD survey test class in CI/local env.

igerber added the ready-for-ci Triggers CI test workflows label Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LPDiD complex-survey-design support (Phase D1)#590

Add LPDiD complex-survey-design support (Phase D1)#590
igerber wants to merge 3 commits into
mainfrom
feature/lpdid-survey

igerber commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jun 30, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant