docs(survey): waive zero-weight-PSU SE-invariance item; lock Lumley full-design convention by igerber · Pull Request #589 · igerber/diff-diff

igerber · 2026-06-30T11:54:43Z

Summary

Re-examined the Actionable Backlog row proposing that the survey TSL finite-sample correction "count only positive-weight PSUs" so the SE is invariant to zero-weight (subpopulation / padded) rows. The premise conflicts with the library's documented, R-validated convention, so the item is waived with no estimator behavior change.
_compute_stratified_psu_meat's per-stratum correction (1 - f_h)*n_PSU_h/(n_PSU_h - 1) and PSU-mean centering intentionally keep genuine-subpopulation zero-weight PSUs — the full-design domain estimator of Lumley (2004 §3.4) / R survey::svyrecvar(subset()). The ATT is exactly invariant; the survey SE is deliberately not invariant to genuine-subpopulation zeroing (it should differ from a naive physical subset — the whole point of subpopulation()). R produces the matching SE (only df differs).
The only invariance-violating shape — appending synthetic new all-zero PSUs — arises in no estimator path (domain padding goes through prep.py's zero-padded full-design cell variance, which retains the real PSU layout; zero-weight rows reusing an existing PSU label are already bit-invariant). Forcing positive-weight-only counting would break the documented Lumley/R parity.
TODO.md: moved the row from Actionable Backlog → "Won't-fix / waived (decisions on the record)" with the Lumley/R justification.
docs/methodology/REGISTRY.md: added a Note in § "Subpopulation Analysis" making explicit that the TSL meat finite-sample correction counts zero-weight PSUs by design.
tests/test_survey.py: added TestZeroWeightPsuConventionWaiver to lock the decision (inert existing-PSU padding is bit-invariant; subpopulation zeroing keeps the full PSU structure so its SE differs from a naive subset). A future positive-weight-only change would collapse the two and trip the test.

Methodology references

Method name(s): Survey Taylor-linearization (TSL) variance; subpopulation / domain estimation.
Paper / source link(s): Lumley (2004) §3.4 "Analysis of complex survey samples"; R survey::svyrecvar(subset()). Existing documentation: REGISTRY § "Subpopulation Analysis".
Any intentional deviations from the source (and why): None new. This documents an existing intentional convention (full-design domain estimation, matching R) and declines a change that would have introduced a divergence. No estimator behavior changes.

Validation

Tests added/updated: tests/test_survey.py::TestZeroWeightPsuConventionWaiver (2 tests). Targeted survey zero-weight / subpopulation / invariance subset: 20 passed.
Backtest / simulation / notebook evidence (if applicable): N/A.

Security / privacy

Confirm no secrets/PII in this PR: Yes

…ull-design convention Re-examined the TODO row proposing the survey TSL finite-sample correction count only positive-weight PSUs so the SE is invariant to zero-weight (subpopulation / padded) rows. Investigation shows the premise conflicts with the library's documented, R-validated convention: - `_compute_stratified_psu_meat`'s per-stratum correction (1 - f_h)*n_PSU_h/(n_PSU_h - 1) and PSU-mean centering intentionally keep genuine-subpopulation zero-weight PSUs. This is the full-design domain estimator of Lumley (2004 Section 3.4) / R survey::svyrecvar(subset()), already documented in REGISTRY section "Subpopulation Analysis". - The ATT is exactly invariant; the survey SE is deliberately NOT invariant to genuine-subpopulation zeroing (it should differ from a naive physical subset -- that is the whole point of subpopulation()). R produces the matching SE (only df differs). - Zero-weight rows that reuse an existing PSU label are already bit-invariant. The only invariance-violating shape -- appending synthetic new all-zero PSUs -- arises in no estimator path (domain padding goes through prep.py's zero-padded full-design cell variance, which retains the real PSU layout). Forcing the meat to positive-weight-only counting would break the documented Lumley/R parity, so the item is waived (no estimator behavior change): - TODO.md: move the row from Actionable Backlog to "Won't-fix / waived (decisions on the record)" with the Lumley/R justification. - REGISTRY.md: add a Note in section "Subpopulation Analysis" making explicit that the TSL meat finite-sample correction counts zero-weight PSUs by design. - tests/test_survey.py: add TestZeroWeightPsuConventionWaiver regression-lock (inert existing-PSU padding is bit-invariant; subpopulation zeroing keeps the full PSU structure so its SE differs from a naive subset). A future positive-weight-only change would collapse the two and trip the test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-30T11:57:35Z

Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings.

Executive Summary

Affected method: Survey Taylor-linearization variance for subpopulation/domain estimation.
No estimator behavior changed; this PR documents and tests the existing full-design convention.
The new REGISTRY note matches _compute_stratified_psu_meat, which groups all PSU score totals and uses full PSU counts in centering and finite-sample correction.
External source check is consistent: Lumley (2004) covers complex survey analysis with linearisation, R subset.survey.design keeps original cluster/stratum design information for subpopulations, and svyrecvar computes variance from estimating functions under multistage sampling. (jstatsoft.org)
git diff --check passed. I could not run the targeted tests because this environment lacks pytest and numpy.

Methodology

Finding: P3 informational, no action required
Impact: The survey SE non-invariance to genuine subpopulation zeroing is now explicitly documented as the intended Lumley/R full-design convention, not a defect. The registry note at docs/methodology/REGISTRY.md:L4539-L4557 matches the implementation path at diff_diff/survey.py:L1447-L1487: zero-score PSUs remain in psu_scores_h, contribute to the stratum mean, and increment n_psu_h.
Concrete fix: None required.

Code Quality

No findings.

Performance

No findings. This PR does not alter runtime code.

Maintainability

No findings. Moving the former backlog item to the waived-decision table in TODO.md:L147-L155 is appropriate and reduces future ambiguity.

Tech Debt

Finding: P3 informational, no action required
Impact: The former zero-weight-PSU invariance item is now tracked under “Won’t-fix / waived,” which satisfies the project’s deferred-work acceptance rule for non-defect methodology decisions.
Concrete fix: None required.

Security

No findings. The changed files are docs/tests only; I saw no secrets or unsafe execution paths.

Documentation/Tests

Finding: P3 informational
Impact: The new tests cover the important user-facing behavior, but tests/test_survey.py:L2247-L2251 only asserts that zeroing a full PSU differs from physically dropping it. That would catch a full positive-weight-only implementation, but it may not catch every partial edit that changes only the finite-sample correction denominator while still centering over the zero PSU.
Concrete fix: Optional: add a direct _compute_stratified_psu_meat unit test with crafted PSU scores, including one all-zero PSU, and assert the exact full-design meat formula using n_PSU_h including the zero-score PSU.

…t test Addresses the CI review's actionable P3 (Documentation/Tests): the SE-level test only asserts that subpopulation zeroing differs from a physical subset, which catches a full positive-weight-only rewrite but could miss a partial edit (e.g. changing only the finite-sample denominator while still centering over the zero PSU). Adds a direct unit test on _compute_stratified_psu_meat with crafted PSU scores including one all-zero-score PSU (a fully zeroed subpopulation PSU), asserting the exact full-design meat formula (n_PSU_h including the zero PSU) and that it is NOT the positive-weight-only meat. Any change to the centering OR the denominator now trips the lock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-30T12:31:01Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 444c8333dd0073ceffe2e32652454f0c89c16299

Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings.

Executive Summary

Affected method: Survey Taylor-linearization variance for subpopulation/domain estimation.
This PR changes docs/TODO/tests only; no estimator runtime behavior changes.
The new REGISTRY note matches the current full-design PSU meat implementation at diff_diff/survey.py:L1447-L1487.
Source check supports the convention: R subset.survey.design keeps original cluster/stratum information, and R svyCprod/onestage adds zero PSU rows for correct subpopulation variance. (rdrr.io)
Prior optional test coverage was improved with a direct _compute_stratified_psu_meat regression test.
I could not run tests here because pytest and numpy are not installed.

Methodology

Finding: P3 informational, no action required
Impact: The documented survey SE non-invariance to genuine subpopulation zeroing is consistent with the Methodology Registry note at docs/methodology/REGISTRY.md:L4539-L4557, the waived TODO entry at TODO.md:L147-L155, and the local implementation’s PSU-score grouping, stratum centering, and n_PSU_h/(n_PSU_h - 1) correction at diff_diff/survey.py:L1447-L1487. The R survey docs also support preserving original design information and zero-padding missing domain PSUs for subpopulation variance. (rdrr.io)
Concrete fix: None required.

Code Quality

No findings.

Performance

No findings. The PR does not change runtime code.

Maintainability

No findings. Moving the former backlog item into the waived-decision table makes the convention easier to preserve.

Tech Debt

No findings. The former action item is now explicitly tracked as waived in TODO.md:L147-L155.

Security

No findings. Changed files are docs/tests only; no secrets or unsafe execution paths observed.

Documentation/Tests

Finding: P3 informational
Impact: The new direct meat test at tests/test_survey.py:L2258-L2311 improves coverage, but its crafted “zeroed PSU” uses zero score rows while ResolvedSurveyDesign.weights remains all ones at tests/test_survey.py:L2280-L2288. That means it would not catch a future denominator-only edit that counts positive-weight PSUs using resolved.weights while still centering over full PSU scores. The end-to-end test still catches the main full positive-weight-only regression.
Concrete fix: Optional: set the two rows for PSU 2 to zero weights in the direct meat fixture, e.g. weights=np.array([1, 1, 0, 0, 1, 1], dtype=float), so the unit test models a true fully zero-weight subpopulation PSU.

Addresses the re-review's actionable P3: the direct meat test represented its all-zero PSU via zero score rows only, with weights left all ones. A future denominator-only edit that reads resolved.weights to drop positive-weight PSUs would not have been caught. Set PSU 2's weights to 0 so the fixture models a true fully zero-weight subpopulation PSU. The current meat ignores weights (it operates on scores), so the expected value is unchanged; the change only hardens the regression lock. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-30T12:34:26Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5590e6c5b1bf5da391ae1099ad3d10d2c6cddd27

Overall Assessment

✅ Looks good. No unmitigated P0/P1 findings.

Executive Summary

Affected method: Survey Taylor-linearization (TSL) variance for subpopulation/domain estimation.
This PR changes docs/TODO/tests only; no estimator runtime behavior or defaults changed.
The new REGISTRY note matches _compute_stratified_psu_meat, which groups all PSU labels and uses n_PSU_h/(n_PSU_h - 1) plus PSU-mean centering without filtering positive-weight PSUs: diff_diff/survey.py:L1447-L1487.
The documented convention is consistent with existing SurveyDesign.subpopulation() behavior, which zeros weights while retaining strata/PSU structure: diff_diff/survey.py:L424-L557.
Prior P3 test concern is addressed: the direct meat fixture now sets the zero-score PSU’s weights to zero at tests/test_survey.py:L2280-L2287.
I could not run tests locally because pytest and numpy are not installed.

Methodology

Finding: P3 informational, no action required
Impact: The PR documents an existing full-design domain-estimation convention rather than changing methodology. The new note in docs/methodology/REGISTRY.md:L4539-L4557 is consistent with the implementation at diff_diff/survey.py:L1447-L1487 and with weighted score construction at diff_diff/survey.py:L2088-L2097. The moved TODO item is correctly waived under TODO.md:L147-L155.
Concrete fix: None.

Code Quality

No findings.

Performance

No findings. Runtime code is unchanged.

Maintainability

No findings. Moving the former backlog item to the waived-decision table improves traceability of this convention.

Tech Debt

No findings. The waived decision is explicitly tracked in TODO.md:L147-L155.

Security

No findings. Changed files are docs/tests only; no secrets or unsafe execution paths observed.

Documentation/Tests

No findings. The new regression tests cover inert same-PSU zero-weight padding, genuine subpopulation zeroing versus naive row dropping, and the direct meat helper’s full-design PSU counting contract at tests/test_survey.py:L2197-L2316.

Verification note: attempted pytest -q -p no:cacheprovider tests/test_survey.py::TestZeroWeightPsuConventionWaiver, but pytest was unavailable. Direct Python execution was also blocked by missing numpy.

igerber added the ready-for-ci Triggers CI test workflows label Jun 30, 2026

igerber merged commit 6b052e6 into main Jun 30, 2026
31 of 32 checks passed

igerber deleted the docs/survey-se-zero-weight-waiver branch June 30, 2026 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(survey): waive zero-weight-PSU SE-invariance item; lock Lumley full-design convention#589

docs(survey): waive zero-weight-PSU SE-invariance item; lock Lumley full-design convention#589
igerber merged 3 commits into
mainfrom
docs/survey-se-zero-weight-waiver

igerber commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jun 30, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

github-actions Bot commented Jun 30, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant