Skip to content

Add LPDiD complex-survey-design support (Phase D1)#590

Open
igerber wants to merge 3 commits into
mainfrom
feature/lpdid-survey
Open

Add LPDiD complex-survey-design support (Phase D1)#590
igerber wants to merge 3 commits into
mainfrom
feature/lpdid-survey

Conversation

@igerber

@igerber igerber commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Summary

  • Adds complex-survey-design support to LPDiD (Phase D, final phase of the LP-DiD salvage initiative). A survey_design= argument on LPDiD.fit() (a SurveyDesign with probability weights + optional strata/PSU/FPC), matching the library-wide fit()-time convention (CallawaySantAnna / SpilloverDiD / ...).
  • On the variance-weighted default path, each horizon's long-difference regression is fit by WLS on the survey weights; the SE is the stratified-PSU Taylor-linearization (Binder 1983 TSL) sandwich meat = Σ_h (1−f_h)·(n_h/(n_h−1))·Σ_j (S_hj−S̄_h)(S_hj−S̄_h)' with df = n_PSU − n_strata, reusing diff_diff/survey.py (compute_survey_vcov / _compute_stratified_psu_meat). No survey.py change.
  • The design is re-resolved on each realized (post-clean-control) sample so weights/strata/PSU align with the regression rows; with no explicit PSU the unit (LP-DiD's default cluster) is injected as the PSU. Fails closed to NaN on under-identified samples.
  • Rejects survey_design with reweight=True (the equally-weighted / regression-adjustment influence-function path), replicate-weight designs, and non-pweight (fweight/aweight) types — each a deferred follow-up.
  • LPDiDResults gains survey_metadata / n_strata / n_psu, a "survey_tsl" vcov_type, and a Survey Design block in summary(). The non-survey path is byte-for-byte unchanged (gated on survey_design is None); all prior LPDiD tests pass identically.

Methodology references (required if estimator / math changes)

  • Method name(s): LPDiD (Dube, Girardi, Jordà & Taylor 2025) + complex-survey Taylor-linearization variance (Binder 1983; Lumley 2004)
  • Paper / source link(s): see docs/methodology/REGISTRY.md §LPDiD (Note Add Synthetic Difference-in-Differences (SDID) estimator #8, new) and §Survey Data Support
  • Any intentional deviations from the source (and why): Survey support is scoped to the variance-weighted default path (the only path with a runnable survey::svyglm reference); the reweighted/RA influence-function variance, replicate weights, and non-pweight types are rejected and documented as deferred follow-ups (REGISTRY Note Add Synthetic Difference-in-Differences (SDID) estimator #8 + TODO). The paper specifies no SE formula; the survey SE is an implementation choice validated against survey::svyglm, documented under Deviations.

Validation

  • Tests added/updated: tests/test_lpdid.py — new TestLPDiDSurvey (15 pure-Python invariant tests: reduction/unit-clustering, FPC-shrinks-SE, stratification, lonely-PSU, NaN-consistency, weighting-moves-point, metadata + fit-idempotence, and rejection paths). All prior LPDiD tests unchanged (non-survey path bit-identical). Green on both pure-Python and Rust backends.
  • Backtest / simulation / notebook evidence (if applicable): the compute_survey_vcov TSL sandwich was validated to match survey::svyglm on the stacked long-difference design (point/SE/df) across full strata+PSU+FPC, no-FPC, and weights-only-with-injection variants. The committed numeric svyglm golden + parity test is the D2 follow-up.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

Adds a `survey_design=` argument to `LPDiD.fit()` (a `SurveyDesign` with
probability weights + optional strata/PSU/FPC), matching the library-wide
fit()-time convention. On the variance-weighted default path each horizon's
long-difference regression is fit by WLS on the survey weights, and the SE is
the stratified-PSU Taylor-linearization (Binder TSL) sandwich with
`df = n_PSU - n_strata`, reusing `diff_diff/survey.py` (`compute_survey_vcov`).

The design is re-resolved on each realized (post-clean-control) sample so
weights/strata/PSU align with the regression rows; with no explicit PSU the
unit is injected as the PSU. Fails closed to NaN on under-identified samples.
Rejects `survey_design` with `reweight=True` (the equally-weighted /
regression-adjustment IF path), replicate-weight designs, and non-pweight
types (deferred follow-ups). `LPDiDResults` gains `survey_metadata` /
`n_strata` / `n_psu`, a `"survey_tsl"` vcov_type, and a Survey Design block in
`summary()`. The non-survey path is byte-for-byte unchanged.

Validated against `survey::svyglm` on the stacked long difference (numeric
golden parity is the D2 follow-up); 15 new pure-Python invariant tests
(reduction/unit-clustering, FPC-shrinks-SE, stratification, lonely-PSU,
NaN-consistency, weighting-moves-point, metadata, rejection paths).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

Overall Assessment
✅ Looks good. No unmitigated P0/P1 methodology or inference correctness issues found.

Executive Summary

  • Affected method: LPDiD, specifically the new fit(survey_design=...) variance-weighted survey path and LPDiDResults reporting.
  • The survey methodology scope is documented in docs/methodology/REGISTRY.md:L1873-L1892; rejected reweight, replicate-weight, and non-pweight paths are also tracked in TODO.md:L121.
  • The new survey inference path uses safe_inference() consistently; I did not find new inline inference or partial NaN-guard anti-patterns.
  • One P2 metadata issue: event-only survey fits can report the wrong headline G when an explicit survey PSU differs from the LPDiD cluster.
  • I could not run tests locally because this environment lacks pytest and numpy.

Methodology

  • Severity: P3 informational
  • Impact: The D1 survey path is a documented library extension: WLS point estimates with Binder TSL survey SEs on the variance-weighted path, with PR-D2 R parity tracked separately. This is mitigated by REGISTRY.md and TODO.md.
  • Concrete fix: No D1 blocker. Complete the tracked PR-D2 survey::svyglm golden parity work in TODO.md:L73.

Code Quality

  • Severity: P2
  • Finding: For only_event=True survey fits, headline_n_clusters falls back to panel["_cluster"].nunique() even when survey_design.psu is the effective variance cluster. The summary then labels G under cluster_name=survey_cluster_name, so an explicit PSU design with n_psu != n_units can display the unit count as PSU count. See diff_diff/lpdid.py:L1501-L1524 and diff_diff/lpdid_results.py:L227-L234.
  • Impact: Metadata/reporting can be misleading for event-only survey results. Per-row event-study n_clusters and statistical inference are still computed on the realized survey design, so this is not a silent wrong SE/p-value issue.
  • Concrete fix: When survey_design is not None and pooled is None, set headline_n_clusters from the effective survey design count, or from an appropriate event-study row’s n_clusters. Add a test with only_event=True, SurveyDesign(psu="psu"), and n_psu != n_units.

Performance

  • Severity: None
  • Impact: No performance regression requiring action found in the changed paths.
  • Concrete fix: None.

Maintainability

  • Severity: P3
  • Impact: The survey path is threaded locally through many helper calls, but the pattern is consistent and covered by tests. The event-only G fallback above is the only maintainability concern.
  • Concrete fix: Cover the survey only_event metadata branch with a focused regression test.

Tech Debt

  • Severity: P3 informational
  • Impact: R parity for LPDiD survey SEs is intentionally deferred and tracked in TODO.md:L73; survey scope gaps are tracked in TODO.md:L121.
  • Concrete fix: No D1 approval blocker.

Security

  • Severity: None
  • Impact: No secrets, PII, or security-sensitive changes found.
  • Concrete fix: None.

Documentation/Tests

  • Severity: P3
  • Impact: Docs and registry were updated for the new survey scope. Test coverage is strong for reduction, FPC, stratification, NaN consistency, metadata, idempotence, and rejection paths, but it misses the event-only explicit-PSU metadata case.
  • Concrete fix: Add the only_event=True explicit-PSU metadata test described above. Local tests were not run because required dependencies are missing in this environment.

…ey fits

CI-codex P2: under a survey design the effective variance cluster is the PSU
(cluster_name reports the PSU column), but for only_event=True fits (pooled is
None) headline_n_clusters fell back to the panel unit count -- so an explicit
PSU design with n_psu != n_units could display the unit count mislabeled as
G=<psu>. Per-row event-study n_clusters and inference were already computed on
the realized survey design, so this was a metadata/labeling issue only, not a
wrong SE/p-value. Fix: when a survey design is active, seed headline_n_clusters
from the panel-level effective PSU count (the pooled-post override still prefers
the realized survey-sample count when available). Regression test added
(only_event=True, explicit PSU, n_psu != n_units).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 89738b53c226149e696c441bc5dc797c2d063286


Overall Assessment

⚠️ Needs changes. One newly identified P1 remains in the new LPDiD survey variance path.

Executive Summary

  • Affected method: LPDiD.fit(survey_design=...), specifically the variance-weighted WLS + Binder TSL path.
  • The prior event-only explicit-PSU metadata issue is addressed via headline_n_clusters = survey_n_psu and a focused regression test at tests/test_lpdid.py:L1867-L1881.
  • The documented D1 scope and deferrals are properly recorded in docs/methodology/REGISTRY.md:L1873-L1892 and TODO.md:L73, TODO.md:L121.
  • P1 [Newly identified]: rank-deficient nuisance columns on the survey path can make otherwise identified treatment SE/inference collapse to NaN because the Binder sandwich is computed on the unreduced full design.
  • I could not run tests locally: pytest is unavailable, and a direct Python repro attempt also failed because numpy is not installed.

Methodology

  • Severity: P1 [Newly identified]

  • Impact: The survey path does not preserve the library’s rank-deficient nuisance-column contract. solve_ols documents that dropped collinear coefficients are set to NaN while residuals are computed from the identified reduced design (diff_diff/linalg.py:L798-L810, diff_diff/linalg.py:L1229-L1288, diff_diff/linalg.py:L1022-L1037). But _estimate_survey_sample() discards the returned residuals, recomputes response - design @ coef, and passes the unreduced design into compute_survey_vcov() (diff_diff/lpdid.py:L939-L955). If a redundant direct-inclusion covariate, absorbed dummy, or lag is dropped while treatment_entry remains identified, the full-design sandwich uses NaN coefficients and/or a singular X'WX bread (diff_diff/survey.py:L2085-L2097), yielding NaN SE/t/p/CI for an otherwise identified survey WLS effect. This is not documented as a D1 limitation.

  • Concrete fix: In _estimate_survey_sample(), keep the residuals returned by solve_ols; derive kept = np.isfinite(coef); if treatment index 1 is dropped, return NaN inference; otherwise compute compute_survey_vcov(design[:, kept], residuals, resolved) and map the reduced treatment index back to the original coefficient. Keep rank_deficient_action="error" raising from solve_ols.

  • Severity: P3 informational

  • Impact: The survey design extension itself is documented as a library extension: WLS point estimates plus Binder TSL SEs, scoped to reweight=False, with replicate weights and non-pweights deferred (docs/methodology/REGISTRY.md:L1873-L1892, TODO.md:L121). These are not defects under the review rules.

  • Concrete fix: None for D1; complete the tracked PR-D2 survey::svyglm parity work in TODO.md:L73.

Code Quality

  • Severity: P2
  • Impact: fit() calls _survey_columns(survey_design) before _resolve_survey_for_fit() performs the SurveyDesign type check (diff_diff/lpdid.py:L95-L105, diff_diff/lpdid.py:L1355-L1362). Passing a non-SurveyDesign object can raise an incidental AttributeError instead of the intended TypeError.
  • Concrete fix: Move the _resolve_survey_for_fit() type validation before _survey_columns(), or make _survey_columns() type-check and raise the same public error.

Performance

  • Severity: None
  • Impact: No performance regression requiring action found in the changed paths.
  • Concrete fix: None.

Maintainability

  • Severity: P2
  • Impact: The new survey branch partially duplicates rank-deficient residual/bread handling already solved in solve_ols, which caused the P1 above.
  • Concrete fix: Centralize the kept-column mapping in _estimate_survey_sample() or add a small helper for “OLS coefficients + residuals + kept columns” so survey sandwiches follow the same contract.

Tech Debt

  • Severity: P3 informational
  • Impact: R parity and unsupported survey combinations are tracked in TODO.md:L73 and TODO.md:L121, so they are not approval blockers.
  • Concrete fix: No D1 action required beyond the P1 fix.

Security

  • Severity: None
  • Impact: No secrets, PII, or security-sensitive changes found.
  • Concrete fix: None.

Documentation/Tests

  • Severity: P1
  • Impact: Existing survey covariate coverage only checks a non-redundant direct-inclusion covariate (tests/test_lpdid.py:L1819-L1830), so the rank-deficient survey WLS/TSL case above is untested.
  • Concrete fix: Add a survey regression test with covariates=["x", "xdup", "xc"], rank_deficient_action="silent", and survey_design=_full_design(), asserting the treatment SE/inference remain finite when the non-redundant reference fit is finite. Also assert rank_deficient_action="error" raises.

Path To Approval

  1. Fix _estimate_survey_sample() to compute the survey vcov on the kept-column design and returned residuals from solve_ols.
  2. Add the redundant-covariate survey regression test described above.
  3. Optionally move the survey_design type validation before _survey_columns() to clean up the public error surface.

…icient contract)

CI-codex P1: `_estimate_survey_sample` computed the Binder TSL sandwich on the
UNREDUCED design and recomputed `response - design @ coef`. When the rank
handler drops a redundant direct-inclusion covariate / absorbed dummy / lag
(setting that coef to NaN while `treatment_entry` stays identified), the NaN
coef propagated through the residuals and the full-design X'WX bread
singularized, collapsing an otherwise-identified treatment SE/t/p/CI to NaN --
violating the library's rank-deficient contract that the non-survey solve_ols
path honors.

Fix: keep solve_ols's returned residuals (original-scale, computed on the
identified reduced design) and build `compute_survey_vcov` on `design[:, kept]`
where `kept = isfinite(coef)`, mapping treatment back to its kept-column index.
If treatment itself is dropped, the effect is NaN and the SE stays NaN;
`rank_deficient_action="error"` still raises from solve_ols. Regression test
(duplicate + constant covariate, `silent`) asserts the treatment SE stays
finite and equals the non-redundant reference fit, and that `error` raises.

CI-codex P2: type-check `survey_design` before `_survey_columns` accesses its
attributes, so a non-SurveyDesign argument raises the intended TypeError rather
than an incidental AttributeError (test added).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e95d7afa79430dde692ef0fadedc4793207c5545


Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings found in the re-review.

Executive Summary

  • Affected method: LPDiD.fit(survey_design=...), specifically the variance-weighted WLS + Binder TSL survey path.
  • The prior P1 rank-deficient nuisance-column issue is addressed: the survey sandwich now uses solve_ols() residuals and the kept-column design.
  • The prior non-SurveyDesign public error issue is also addressed with an upfront type check.
  • Survey scope limitations are documented in REGISTRY.md and TODO.md, so they are informational under the review rules.
  • I could not run tests locally: pytest is not installed, and importing the package fails because numpy is unavailable.

Methodology

  • Severity: P3 informational
    Impact: The LPDiD survey extension is documented as a library extension: WLS point estimates with stratified-PSU Binder TSL SEs, scoped to reweight=False; reweight=True, replicate weights, and non-pweights are rejected and tracked. This matches docs/methodology/REGISTRY.md:L1873-L1892 and TODO.md:L73, TODO.md:L121.
    Concrete fix: None required for D1.

  • Severity: None
    Impact: The previous rank-deficiency methodology issue is resolved. _estimate_survey_sample() now computes coefficients/residuals via solve_ols(), derives kept = np.isfinite(coef), and calls compute_survey_vcov(design[:, kept], residuals, resolved), preserving the documented solve_ols() rank-deficient contract. See diff_diff/lpdid.py:L939-L977; regression coverage is in tests/test_lpdid.py:L1831-L1857.
    Concrete fix: None.

Code Quality

  • Severity: None
    Impact: The previous incidental AttributeError surface is fixed: fit() now type-checks survey_design before _survey_columns() accesses its attributes. See diff_diff/lpdid.py:L1371-L1376 and tests/test_lpdid.py:L1859-L1863.
    Concrete fix: None.

Performance

  • Severity: None
    Impact: No performance regression requiring action found in the changed paths. Per-sample survey re-resolution is intentional and documented.
    Concrete fix: None.

Maintainability

  • Severity: None
    Impact: The new survey branch now follows the existing rank-deficient OLS contract instead of duplicating full-design residual/bread handling.
    Concrete fix: None.

Tech Debt

  • Severity: P3 informational
    Impact: D1 defers numeric survey::svyglm parity and unsupported LPDiD survey combinations, but both are tracked in TODO.md:L73 and TODO.md:L121.
    Concrete fix: None for this PR.

Security

  • Severity: None
    Impact: No secrets, PII, or security-sensitive changes found.
    Concrete fix: None.

Documentation/Tests

  • Severity: P3 informational
    Impact: Documentation and tests cover the new survey behavior, including FPC, stratification, weighting, metadata, rejection paths, NaN-consistency, idempotence, and the prior rank-deficient survey regression. See tests/test_lpdid.py:L1768-L1980. I could not execute them in this sandbox because pytest and numpy are unavailable.
    Concrete fix: None required in the PR; run the focused LPDiD survey test class in CI/local env.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant