From 019e6dfdaf0f143506bb8b5a9f40c1cfb26d86a4 Mon Sep 17 00:00:00 2001 From: igerber Date: Tue, 30 Jun 2026 07:54:16 -0400 Subject: [PATCH 1/3] docs(survey): waive zero-weight-PSU SE-invariance item; lock Lumley full-design convention Re-examined the TODO row proposing the survey TSL finite-sample correction count only positive-weight PSUs so the SE is invariant to zero-weight (subpopulation / padded) rows. Investigation shows the premise conflicts with the library's documented, R-validated convention: - `_compute_stratified_psu_meat`'s per-stratum correction (1 - f_h)*n_PSU_h/(n_PSU_h - 1) and PSU-mean centering intentionally keep genuine-subpopulation zero-weight PSUs. This is the full-design domain estimator of Lumley (2004 Section 3.4) / R survey::svyrecvar(subset()), already documented in REGISTRY section "Subpopulation Analysis". - The ATT is exactly invariant; the survey SE is deliberately NOT invariant to genuine-subpopulation zeroing (it should differ from a naive physical subset -- that is the whole point of subpopulation()). R produces the matching SE (only df differs). - Zero-weight rows that reuse an existing PSU label are already bit-invariant. The only invariance-violating shape -- appending synthetic new all-zero PSUs -- arises in no estimator path (domain padding goes through prep.py's zero-padded full-design cell variance, which retains the real PSU layout). Forcing the meat to positive-weight-only counting would break the documented Lumley/R parity, so the item is waived (no estimator behavior change): - TODO.md: move the row from Actionable Backlog to "Won't-fix / waived (decisions on the record)" with the Lumley/R justification. - REGISTRY.md: add a Note in section "Subpopulation Analysis" making explicit that the TSL meat finite-sample correction counts zero-weight PSUs by design. - tests/test_survey.py: add TestZeroWeightPsuConventionWaiver regression-lock (inert existing-PSU padding is bit-invariant; subpopulation zeroing keeps the full PSU structure so its SE differs from a naive subset). A future positive-weight-only change would collapse the two and trip the test. Co-Authored-By: Claude Opus 4.8 (1M context) --- TODO.md | 2 +- docs/methodology/REGISTRY.md | 19 ++++++ tests/test_survey.py | 109 +++++++++++++++++++++++++++++++++++ 3 files changed, 129 insertions(+), 1 deletion(-) diff --git a/TODO.md b/TODO.md index 3569b833..475fe869 100644 --- a/TODO.md +++ b/TODO.md @@ -34,7 +34,6 @@ The `Origin` column (Actionable tables) and the `PR` column (Deferred tables) bo | `SyntheticControl` conformal (CWZ 2021) extensions: (a) one-sided / signed-`t` variants (§7); (b) covariates in the conformal proxy (`X_jt`, eqs 4/6 — current proxy is outcomes-only); (c) AR / innovation-permutation path (Lemmas 5-7) for time-series proxies. The joint test, pointwise CIs, and average-effect CI have landed. | `conformal.py`, `synthetic_control_results.py` | CWZ-2021 | Heavy | Low | | `ContinuousDiD` CGBS-2024 extensions (matches R `contdid` v0.1.0 deferral set): (a) `covariates=` kwarg; (b) discrete-treatment saturated regression (integer dose currently warned, not routed to per-level coefficients); (c) lowest-dose-as-control per Remark 3.1 when `P(D=0)=0`. REGISTRY `## ContinuousDiD` → Implementation Checklist marks these `[ ]`. | `continuous_did.py` | CGBS-2024 | Heavy | Low | | `EfficientDiD` survey-weighted Silverman bandwidth in conditional Omega* — `_silverman_bandwidth()` uses unweighted mean/std; survey-weighted statistics better reflect the population distribution (second-order refinement). | `efficient_did_covariates.py` | — | Quick | Low | -| Survey sandwich SE is not exactly invariant to zero-weight (subpopulation / padded) rows: `_compute_stratified_psu_meat`'s finite-sample correction counts zero-weight units as PSUs, so padding shifts the SE ~2e-4 relative. Point estimate is exactly invariant. Fix: count only positive-weight PSUs in the correction (cross-cutting across all survey-enabled estimators). | `survey.py` (`_compute_stratified_psu_meat`) | PR-B | Mid | Low | | `ImputationDiD` LOO conservative-variance refinement (BJS 2024 Supp. Appendix A.9) — a finite-sample improvement to the auxiliary-model residuals reducing overfit of `tau_tilde_g` to `epsilon`. Asymptotic Theorem-3 variance is implemented and matches R `didimputation` (which also omits LOO by default). | `imputation.py` | imputation-validation | Mid | Low | | `TwoWayFixedEffects(vcov_type in {hc2, hc2_bm})` with replicate-weight designs raises `NotImplementedError` (`twfe.py:~233`). The replicate path re-demeans per replicate, which doesn't compose with the full-dummy HC2/HC2-BM build — a correct impl needs per-replicate full-dummy refit. Workaround: `hc1` for replicate-weight CR1. | `twfe.py::fit` | follow-up | Heavy | Low | | TWFE's HC2/HC2-BM inline full-dummy build (`twfe.py:280-315`) duplicates the dummy-construction logic in `DifferenceInDifferences(fixed_effects=...)` (`estimators.py:478-486`). Extract a shared helper, or delegate TWFE's HC2/HC2-BM path to DiD's `fixed_effects=` branch (with TWFE-specific cluster-default threading), to reduce drift risk on FE naming / survey behavior / result-surface conventions. Substantive refactor — touches both estimators. | `twfe.py::fit`, `estimators.py::DifferenceInDifferences.fit` | follow-up | Heavy | Low | @@ -153,6 +152,7 @@ Doable in principle, but no current caller and/or explicitly out of paper scope. | **R-script-per-test consolidation has no CI impact.** No CI workflow installs R, so every R-parity test skips in CI behind a per-file availability gate — consolidating `Rscript` spawns yields zero CI speedup. `test_methodology_twfe.py` already session-caches its R fits. The only residual is a LOCAL-dev micro-opt for `test_methodology_continuous_did.py` / `test_methodology_callaway.py` (re-spawn `library(...)` per call). Low value; retained as a local-dev note. | `tests/test_methodology_continuous_did.py`, `tests/test_methodology_callaway.py` | #139 / 2026-06-07 | | **`HeterogeneousAdoptionDiD` mass-point IV bread is non-symmetric — `_rank_guarded_inv` inapplicable.** The structural rank-guard sweep (continuous_did / two_stage / spillover / conley) excluded `had.py`'s `ZtWX = Zd'WX` ([1, instrument]' × [1, endogenous]): it is a non-symmetric 2×2 Wald-IV bread (`V = ZtWX_inv @ Omega @ ZtWX_inv.T`), and `_rank_guarded_inv` assumes a **symmetric PSD** Gram (symmetric `D=diag(A)` equilibration + eigendecomposition), so applying it would be methodologically wrong. The existing fallback already returns NaN SE on a singular bread; an IV-appropriate near-singular guard would need a different mechanism. | `had.py` | structural-rank-guard / 2026-06-28 | | **`ImputationDiD` SE vcov is already rank-guarded upstream.** Excluded from the structural rank-guard sweep: the lead/effect vcov comes from `solve_ols(..., return_vcov=True, rank_deficient_action=...)` at the OLS fit (`imputation.py:~2316`), which already drops rank-deficient columns. The only raw inverse (`solve(V_gamma, gamma)`, `imputation.py:~2530`) is the pretrends **Wald F-test statistic** with a safe `NaN` fallback — a test statistic, not a sandwich bread — so there is no garbage-SE exposure. No structural rank-guard needed. | `imputation.py` | structural-rank-guard / 2026-06-28 | +| **Survey TSL SE intentionally counts genuine-subpopulation zero-weight PSUs (matches R, NOT a bug).** Re-examined the former "count only positive-weight PSUs in the correction" item (origin PR-B). `_compute_stratified_psu_meat`'s finite-sample correction `(1-f_h)·n_{PSU,h}/(n_{PSU,h}-1)` and PSU-mean centering keep zero-weight PSUs — this is the **full-design domain-estimation convention** (Lumley 2004 §3.4; R `survey::svyrecvar(subset())`), already documented in REGISTRY § "Subpopulation Analysis" (the survey-vcov path deliberately differs from the positive-weight invariance applied *outside* it). The ATT is exactly invariant; the SE is intentionally NOT invariant to genuine-subpopulation zeroing (it *should* differ from a naive physical subset — that is the whole point of `subpopulation()`). Repro (`scratchpad`): zeroing a full PSU vs physically dropping it differs ~5e-3 rel — the Lumley-correct gap, and R's `svyrecvar(subset())` produces the matching SE (only `df` differs; see the § "Subpopulation Analysis" Deviation note). The only truly invariance-violating shape — appending *synthetic new* all-zero PSUs — does not arise in any estimator path (real padding reuses existing PSU labels and is already bit-invariant, or is genuine domain estimation via `prep.py`'s zero-padded full-design cell variance). "Fixing" the meat to positive-weight-only would break the documented Lumley/R parity. Regression-locked by `tests/test_survey.py::TestZeroWeightPsuConventionWaiver`. | `survey.py` (`_compute_stratified_psu_meat`) | PR-B / 2026-06-30 | --- diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index aa0c93a5..34cb3134 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -4536,6 +4536,25 @@ Domain estimation preserving full design structure. paths use positive-weight count for df adjustments, ensuring zero-weight padding is inference-invariant outside the survey vcov path. DEFF effective-n also uses positive-weight count. +- **Note:** The TSL meat itself follows the same full-design convention: + `_compute_stratified_psu_meat`'s per-stratum finite-sample correction + `(1 - f_h)·n_{PSU,h}/(n_{PSU,h}-1)` and PSU-mean centering count + zero-weight PSUs (a genuine-subpopulation PSU with all members outside the + domain contributes a zero PSU-score `0` that is centered to `-\bar{z}_h` + and still increments `n_{PSU,h}`). This is the Lumley (2004 §3.4) / + R `survey::svyrecvar(subset())` domain estimator — so the survey SE is + deliberately **not** invariant to genuine-subpopulation zeroing: it + differs from the SE of a naive physical subset, which is the whole point + of preserving the design (`subpopulation()` is correct *because* it does + not equal the naive subset). Zero-weight rows that reuse an existing PSU + label are inert (their weighted score is `0`, so the PSU-score sum is + unchanged), so padding that preserves PSU membership is bit-invariant; + only adding *new* synthetic all-zero PSUs would shift the SE, and no + estimator path does so (domain padding goes through the zero-padded + full-design cell variance in `prep.py`, which retains the real PSU layout). + A former TODO proposed counting only positive-weight PSUs to force SE + invariance; that was waived (TODO § "Won't-fix / waived") because it would + break this documented R parity. - **Deviation from R:** `subpopulation()` preserves all strata in df computation even when a stratum has no positive-weight observations, while R's `subset()` drops empty strata from `survey::degf()`. For diff --git a/tests/test_survey.py b/tests/test_survey.py index 02b844d6..0a89caea 100644 --- a/tests/test_survey.py +++ b/tests/test_survey.py @@ -2146,6 +2146,115 @@ def test_zero_weights_accepted(self): assert coef is not None +class TestZeroWeightPsuConventionWaiver: + """Lock the waived decision on the survey TSL finite-sample correction. + + ``_compute_stratified_psu_meat`` intentionally keeps genuine-subpopulation + zero-weight PSUs in its per-stratum correction + ``(1 - f_h)*n_PSU_h/(n_PSU_h - 1)`` and PSU-mean centering — the full-design + domain estimator of Lumley (2004 §3.4) / R ``survey::svyrecvar(subset())``. + A former TODO proposed "count only positive-weight PSUs" to force the survey + SE to be invariant to zero-weight rows; that was **waived** (TODO + § "Won't-fix / waived"; REGISTRY § "Subpopulation Analysis") because it would + break the documented R parity. These tests trip if that change is ever made: + the proposed fix would collapse the subpopulation SE onto the naive-subset SE. + """ + + @staticmethod + def _panel(): + rng = np.random.default_rng(20260630) + rows = [] + uid = 0 + for h in range(3): # strata + for p in range(4): # PSUs per stratum (>=2 so a drop leaves the path live) + psu_id = f"s{h}_p{p}" + for _ in range(2): # units per PSU + treated = int(rng.random() < 0.5) + for t in (0, 1): # periods + y = 2.0 * treated * t + 0.5 * h + 0.3 * p + rng.normal() + rows.append( + dict( + unit=uid, + stratum=h, + psu=psu_id, + treated=treated, + post=t, + outcome=y, + w=rng.uniform(0.5, 2.0), + ) + ) + uid += 1 + return pd.DataFrame(rows) + + def _fit(self, df): + sd = SurveyDesign(weights="w", strata="stratum", psu="psu") + res = DifferenceInDifferences().fit( + df, outcome="outcome", treatment="treated", time="post", survey_design=sd + ) + return res.att, res.se + + def test_inert_padding_reusing_existing_psu_is_bit_invariant(self): + """Zero-weight rows under an EXISTING PSU label are inert: ATT and SE + bit-identical. (Their weighted score is 0, so the PSU-score sum and the + PSU count are both unchanged.)""" + base = self._panel() + att0, se0 = self._fit(base) + + pad = [] + for _ in range(8): + for t in (0, 1): + pad.append( + dict( + unit=8000, + stratum=0, + psu="s0_p1", # reuse an existing PSU + treated=0, + post=t, + outcome=99.0, # arbitrary; weight 0 makes it inert + w=0.0, + ) + ) + padded = pd.concat([base, pd.DataFrame(pad)], ignore_index=True) + att1, se1 = self._fit(padded) + + np.testing.assert_allclose(att1, att0, atol=1e-12) + np.testing.assert_allclose(se1, se0, atol=1e-12) + + def test_subpopulation_zeroing_keeps_full_psu_structure(self): + """Genuine subpopulation (zero a full PSU's weights) is NOT a naive + physical subset. The WLS fit is identical (zero-weight rows drop out of + X'WX), so the ATT is exactly invariant; but the TSL SE intentionally + differs from the dropped-rows SE because the zeroed PSU still counts in + the finite-sample correction (Lumley full-design convention). + + The proposed "count only positive-weight PSUs" change would make the + zeroed SE equal the dropped SE; this assertion guards against it. + """ + base = self._panel() + victim = "s0_p0" + + zeroed = base.copy() + zeroed.loc[zeroed["psu"] == victim, "w"] = 0.0 + att_zero, se_zero = self._fit(zeroed) + + dropped = base[base["psu"] != victim].copy() + att_drop, se_drop = self._fit(dropped) + + # ATT is exactly invariant: zero-weight rows contribute nothing to the + # weighted normal equations, so the point estimate matches the subset. + np.testing.assert_allclose(att_zero, att_drop, atol=1e-10) + + # SE is deliberately NOT invariant: the zeroed PSU stays in n_PSU_h and + # the centering (the Lumley domain gap, ~5e-3 rel here). A positive- + # weight-only correction would drive this to ~0. + rel_gap = abs(se_zero - se_drop) / se_drop + assert rel_gap > 1e-3, ( + f"survey SE became invariant to subpopulation zeroing " + f"(rel_gap={rel_gap:.2e}); the full-design Lumley convention " + f"(TODO § Won't-fix / waived) appears to have been reverted." + ) + + class TestRound8Fixes: """Tests for round-8 review fixes (PR #218).""" From 444c8333dd0073ceffe2e32652454f0c89c16299 Mon Sep 17 00:00:00 2001 From: igerber Date: Tue, 30 Jun 2026 08:28:02 -0400 Subject: [PATCH 2/3] test(survey): add direct _compute_stratified_psu_meat full-design unit test Addresses the CI review's actionable P3 (Documentation/Tests): the SE-level test only asserts that subpopulation zeroing differs from a physical subset, which catches a full positive-weight-only rewrite but could miss a partial edit (e.g. changing only the finite-sample denominator while still centering over the zero PSU). Adds a direct unit test on _compute_stratified_psu_meat with crafted PSU scores including one all-zero-score PSU (a fully zeroed subpopulation PSU), asserting the exact full-design meat formula (n_PSU_h including the zero PSU) and that it is NOT the positive-weight-only meat. Any change to the centering OR the denominator now trips the lock. Co-Authored-By: Claude Opus 4.8 (1M context) --- tests/test_survey.py | 56 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) diff --git a/tests/test_survey.py b/tests/test_survey.py index 0a89caea..2b98b3bf 100644 --- a/tests/test_survey.py +++ b/tests/test_survey.py @@ -15,6 +15,7 @@ from diff_diff.linalg import LinearRegression, compute_robust_vcov, solve_ols from diff_diff.survey import ( ResolvedSurveyDesign, + _compute_stratified_psu_meat, compute_survey_metadata, compute_survey_vcov, ) @@ -2254,6 +2255,61 @@ def test_subpopulation_zeroing_keeps_full_psu_structure(self): f"(TODO § Won't-fix / waived) appears to have been reverted." ) + def test_stratified_meat_counts_zero_score_psu_in_full_design(self): + """Direct unit test on ``_compute_stratified_psu_meat``: an all-zero-score + PSU (a fully zeroed subpopulation PSU) stays in ``n_PSU_h``, the stratum + PSU-mean, and the centering. Asserts the **exact** full-design meat and + that it is NOT the positive-weight-only meat (which would drop the zero + PSU). This guards against partial edits the SE-level test could miss — + e.g. changing only the finite-sample denominator while still centering + over the zero PSU. + """ + # Two strata; stratum 0 has an all-zero PSU (psu 2 = two zeroed rows). + scores = np.array( + [ + [1.5, -0.5], # stratum 0, psu 0 + [0.5, 2.0], # stratum 0, psu 1 + [0.0, 0.0], # stratum 0, psu 2 (all-zero subpop PSU), row 1 + [0.0, 0.0], # stratum 0, psu 2, row 2 + [-1.0, 0.7], # stratum 1, psu 3 + [0.3, -1.2], # stratum 1, psu 4 + ] + ) + strata = np.array([0, 0, 0, 0, 1, 1]) + psu = np.array([0, 1, 2, 2, 3, 4]) + resolved = ResolvedSurveyDesign( + weights=np.ones(6), + weight_type="pweight", + strata=strata, + psu=psu, + fpc=None, + n_strata=2, + n_psu=5, + lonely_psu="remove", + ) + + meat, variance_computed, _ = _compute_stratified_psu_meat(scores, resolved) + assert variance_computed + + def _stratum_meat(psu_scores): + # Full-design per-stratum meat with fpc=None: (n_h/(n_h-1)) * S'S. + n_h = psu_scores.shape[0] + centered = psu_scores - psu_scores.mean(axis=0, keepdims=True) + return (n_h / (n_h - 1)) * (centered.T @ centered) + + # Full-design reference INCLUDES the all-zero PSU in stratum 0 (n_h=3). + s0_full = np.array([[1.5, -0.5], [0.5, 2.0], [0.0, 0.0]]) + s1 = np.array([[-1.0, 0.7], [0.3, -1.2]]) + expected_full = _stratum_meat(s0_full) + _stratum_meat(s1) + np.testing.assert_allclose(meat, expected_full, atol=1e-12) + + # The positive-weight-only meat would DROP the zero PSU from stratum 0 + # (n_h=2, mean over the two nonzero PSUs) — exactly the change the waived + # TODO proposed. The full-design meat must NOT equal it. + s0_pos = np.array([[1.5, -0.5], [0.5, 2.0]]) + positive_only = _stratum_meat(s0_pos) + _stratum_meat(s1) + assert not np.allclose(meat, positive_only, atol=1e-8) + class TestRound8Fixes: """Tests for round-8 review fixes (PR #218).""" From 5590e6c5b1bf5da391ae1099ad3d10d2c6cddd27 Mon Sep 17 00:00:00 2001 From: igerber Date: Tue, 30 Jun 2026 08:32:30 -0400 Subject: [PATCH 3/3] test(survey): model a true zero-weight PSU in the direct meat fixture Addresses the re-review's actionable P3: the direct meat test represented its all-zero PSU via zero score rows only, with weights left all ones. A future denominator-only edit that reads resolved.weights to drop positive-weight PSUs would not have been caught. Set PSU 2's weights to 0 so the fixture models a true fully zero-weight subpopulation PSU. The current meat ignores weights (it operates on scores), so the expected value is unchanged; the change only hardens the regression lock. Co-Authored-By: Claude Opus 4.8 (1M context) --- tests/test_survey.py | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/tests/test_survey.py b/tests/test_survey.py index 2b98b3bf..7298686e 100644 --- a/tests/test_survey.py +++ b/tests/test_survey.py @@ -2277,8 +2277,13 @@ def test_stratified_meat_counts_zero_score_psu_in_full_design(self): ) strata = np.array([0, 0, 0, 0, 1, 1]) psu = np.array([0, 1, 2, 2, 3, 4]) + # PSU 2's rows carry zero weight as well as zero score, so the fixture + # models a *true* fully zero-weight subpopulation PSU. The current meat + # ignores `weights` (it operates on scores), so this does not change the + # expected value — but it hardens the lock against a future denominator- + # only edit that reads `resolved.weights` to drop positive-weight PSUs. resolved = ResolvedSurveyDesign( - weights=np.ones(6), + weights=np.array([1.0, 1.0, 0.0, 0.0, 1.0, 1.0]), weight_type="pweight", strata=strata, psu=psu,