Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ The `Origin` column (Actionable tables) and the `PR` column (Deferred tables) bo
| `SyntheticControl` conformal (CWZ 2021) extensions: (a) one-sided / signed-`t` variants (§7); (b) covariates in the conformal proxy (`X_jt`, eqs 4/6 — current proxy is outcomes-only); (c) AR / innovation-permutation path (Lemmas 5-7) for time-series proxies. The joint test, pointwise CIs, and average-effect CI have landed. | `conformal.py`, `synthetic_control_results.py` | CWZ-2021 | Heavy | Low |
| `ContinuousDiD` CGBS-2024 extensions (matches R `contdid` v0.1.0 deferral set): (a) `covariates=` kwarg; (b) discrete-treatment saturated regression (integer dose currently warned, not routed to per-level coefficients); (c) lowest-dose-as-control per Remark 3.1 when `P(D=0)=0`. REGISTRY `## ContinuousDiD` → Implementation Checklist marks these `[ ]`. | `continuous_did.py` | CGBS-2024 | Heavy | Low |
| `EfficientDiD` survey-weighted Silverman bandwidth in conditional Omega* — `_silverman_bandwidth()` uses unweighted mean/std; survey-weighted statistics better reflect the population distribution (second-order refinement). | `efficient_did_covariates.py` | — | Quick | Low |
| Survey sandwich SE is not exactly invariant to zero-weight (subpopulation / padded) rows: `_compute_stratified_psu_meat`'s finite-sample correction counts zero-weight units as PSUs, so padding shifts the SE ~2e-4 relative. Point estimate is exactly invariant. Fix: count only positive-weight PSUs in the correction (cross-cutting across all survey-enabled estimators). | `survey.py` (`_compute_stratified_psu_meat`) | PR-B | Mid | Low |
| `ImputationDiD` LOO conservative-variance refinement (BJS 2024 Supp. Appendix A.9) — a finite-sample improvement to the auxiliary-model residuals reducing overfit of `tau_tilde_g` to `epsilon`. Asymptotic Theorem-3 variance is implemented and matches R `didimputation` (which also omits LOO by default). | `imputation.py` | imputation-validation | Mid | Low |
| `TwoWayFixedEffects(vcov_type in {hc2, hc2_bm})` with replicate-weight designs raises `NotImplementedError` (`twfe.py:~233`). The replicate path re-demeans per replicate, which doesn't compose with the full-dummy HC2/HC2-BM build — a correct impl needs per-replicate full-dummy refit. Workaround: `hc1` for replicate-weight CR1. | `twfe.py::fit` | follow-up | Heavy | Low |
| TWFE's HC2/HC2-BM inline full-dummy build (`twfe.py:280-315`) duplicates the dummy-construction logic in `DifferenceInDifferences(fixed_effects=...)` (`estimators.py:478-486`). Extract a shared helper, or delegate TWFE's HC2/HC2-BM path to DiD's `fixed_effects=` branch (with TWFE-specific cluster-default threading), to reduce drift risk on FE naming / survey behavior / result-surface conventions. Substantive refactor — touches both estimators. | `twfe.py::fit`, `estimators.py::DifferenceInDifferences.fit` | follow-up | Heavy | Low |
Expand Down Expand Up @@ -153,6 +152,7 @@ Doable in principle, but no current caller and/or explicitly out of paper scope.
| **R-script-per-test consolidation has no CI impact.** No CI workflow installs R, so every R-parity test skips in CI behind a per-file availability gate — consolidating `Rscript` spawns yields zero CI speedup. `test_methodology_twfe.py` already session-caches its R fits. The only residual is a LOCAL-dev micro-opt for `test_methodology_continuous_did.py` / `test_methodology_callaway.py` (re-spawn `library(...)` per call). Low value; retained as a local-dev note. | `tests/test_methodology_continuous_did.py`, `tests/test_methodology_callaway.py` | #139 / 2026-06-07 |
| **`HeterogeneousAdoptionDiD` mass-point IV bread is non-symmetric — `_rank_guarded_inv` inapplicable.** The structural rank-guard sweep (continuous_did / two_stage / spillover / conley) excluded `had.py`'s `ZtWX = Zd'WX` ([1, instrument]' × [1, endogenous]): it is a non-symmetric 2×2 Wald-IV bread (`V = ZtWX_inv @ Omega @ ZtWX_inv.T`), and `_rank_guarded_inv` assumes a **symmetric PSD** Gram (symmetric `D=diag(A)` equilibration + eigendecomposition), so applying it would be methodologically wrong. The existing fallback already returns NaN SE on a singular bread; an IV-appropriate near-singular guard would need a different mechanism. | `had.py` | structural-rank-guard / 2026-06-28 |
| **`ImputationDiD` SE vcov is already rank-guarded upstream.** Excluded from the structural rank-guard sweep: the lead/effect vcov comes from `solve_ols(..., return_vcov=True, rank_deficient_action=...)` at the OLS fit (`imputation.py:~2316`), which already drops rank-deficient columns. The only raw inverse (`solve(V_gamma, gamma)`, `imputation.py:~2530`) is the pretrends **Wald F-test statistic** with a safe `NaN` fallback — a test statistic, not a sandwich bread — so there is no garbage-SE exposure. No structural rank-guard needed. | `imputation.py` | structural-rank-guard / 2026-06-28 |
| **Survey TSL SE intentionally counts genuine-subpopulation zero-weight PSUs (matches R, NOT a bug).** Re-examined the former "count only positive-weight PSUs in the correction" item (origin PR-B). `_compute_stratified_psu_meat`'s finite-sample correction `(1-f_h)·n_{PSU,h}/(n_{PSU,h}-1)` and PSU-mean centering keep zero-weight PSUs — this is the **full-design domain-estimation convention** (Lumley 2004 §3.4; R `survey::svyrecvar(subset())`), already documented in REGISTRY § "Subpopulation Analysis" (the survey-vcov path deliberately differs from the positive-weight invariance applied *outside* it). The ATT is exactly invariant; the SE is intentionally NOT invariant to genuine-subpopulation zeroing (it *should* differ from a naive physical subset — that is the whole point of `subpopulation()`). Repro (`scratchpad`): zeroing a full PSU vs physically dropping it differs ~5e-3 rel — the Lumley-correct gap, and R's `svyrecvar(subset())` produces the matching SE (only `df` differs; see the § "Subpopulation Analysis" Deviation note). The only truly invariance-violating shape — appending *synthetic new* all-zero PSUs — does not arise in any estimator path (real padding reuses existing PSU labels and is already bit-invariant, or is genuine domain estimation via `prep.py`'s zero-padded full-design cell variance). "Fixing" the meat to positive-weight-only would break the documented Lumley/R parity. Regression-locked by `tests/test_survey.py::TestZeroWeightPsuConventionWaiver`. | `survey.py` (`_compute_stratified_psu_meat`) | PR-B / 2026-06-30 |

---

Expand Down
19 changes: 19 additions & 0 deletions docs/methodology/REGISTRY.md
Original file line number Diff line number Diff line change
Expand Up @@ -4536,6 +4536,25 @@ Domain estimation preserving full design structure.
paths use positive-weight count for df adjustments, ensuring zero-weight
padding is inference-invariant outside the survey vcov path. DEFF
effective-n also uses positive-weight count.
- **Note:** The TSL meat itself follows the same full-design convention:
`_compute_stratified_psu_meat`'s per-stratum finite-sample correction
`(1 - f_h)·n_{PSU,h}/(n_{PSU,h}-1)` and PSU-mean centering count
zero-weight PSUs (a genuine-subpopulation PSU with all members outside the
domain contributes a zero PSU-score `0` that is centered to `-\bar{z}_h`
and still increments `n_{PSU,h}`). This is the Lumley (2004 §3.4) /
R `survey::svyrecvar(subset())` domain estimator — so the survey SE is
deliberately **not** invariant to genuine-subpopulation zeroing: it
differs from the SE of a naive physical subset, which is the whole point
of preserving the design (`subpopulation()` is correct *because* it does
not equal the naive subset). Zero-weight rows that reuse an existing PSU
label are inert (their weighted score is `0`, so the PSU-score sum is
unchanged), so padding that preserves PSU membership is bit-invariant;
only adding *new* synthetic all-zero PSUs would shift the SE, and no
estimator path does so (domain padding goes through the zero-padded
full-design cell variance in `prep.py`, which retains the real PSU layout).
A former TODO proposed counting only positive-weight PSUs to force SE
invariance; that was waived (TODO § "Won't-fix / waived") because it would
break this documented R parity.
- **Deviation from R:** `subpopulation()` preserves all strata in df
computation even when a stratum has no positive-weight observations,
while R's `subset()` drops empty strata from `survey::degf()`. For
Expand Down
170 changes: 170 additions & 0 deletions tests/test_survey.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
from diff_diff.linalg import LinearRegression, compute_robust_vcov, solve_ols
from diff_diff.survey import (
ResolvedSurveyDesign,
_compute_stratified_psu_meat,
compute_survey_metadata,
compute_survey_vcov,
)
Expand Down Expand Up @@ -2146,6 +2147,175 @@ def test_zero_weights_accepted(self):
assert coef is not None


class TestZeroWeightPsuConventionWaiver:
"""Lock the waived decision on the survey TSL finite-sample correction.

``_compute_stratified_psu_meat`` intentionally keeps genuine-subpopulation
zero-weight PSUs in its per-stratum correction
``(1 - f_h)*n_PSU_h/(n_PSU_h - 1)`` and PSU-mean centering — the full-design
domain estimator of Lumley (2004 §3.4) / R ``survey::svyrecvar(subset())``.
A former TODO proposed "count only positive-weight PSUs" to force the survey
SE to be invariant to zero-weight rows; that was **waived** (TODO
§ "Won't-fix / waived"; REGISTRY § "Subpopulation Analysis") because it would
break the documented R parity. These tests trip if that change is ever made:
the proposed fix would collapse the subpopulation SE onto the naive-subset SE.
"""

@staticmethod
def _panel():
rng = np.random.default_rng(20260630)
rows = []
uid = 0
for h in range(3): # strata
for p in range(4): # PSUs per stratum (>=2 so a drop leaves the path live)
psu_id = f"s{h}_p{p}"
for _ in range(2): # units per PSU
treated = int(rng.random() < 0.5)
for t in (0, 1): # periods
y = 2.0 * treated * t + 0.5 * h + 0.3 * p + rng.normal()
rows.append(
dict(
unit=uid,
stratum=h,
psu=psu_id,
treated=treated,
post=t,
outcome=y,
w=rng.uniform(0.5, 2.0),
)
)
uid += 1
return pd.DataFrame(rows)

def _fit(self, df):
sd = SurveyDesign(weights="w", strata="stratum", psu="psu")
res = DifferenceInDifferences().fit(
df, outcome="outcome", treatment="treated", time="post", survey_design=sd
)
return res.att, res.se

def test_inert_padding_reusing_existing_psu_is_bit_invariant(self):
"""Zero-weight rows under an EXISTING PSU label are inert: ATT and SE
bit-identical. (Their weighted score is 0, so the PSU-score sum and the
PSU count are both unchanged.)"""
base = self._panel()
att0, se0 = self._fit(base)

pad = []
for _ in range(8):
for t in (0, 1):
pad.append(
dict(
unit=8000,
stratum=0,
psu="s0_p1", # reuse an existing PSU
treated=0,
post=t,
outcome=99.0, # arbitrary; weight 0 makes it inert
w=0.0,
)
)
padded = pd.concat([base, pd.DataFrame(pad)], ignore_index=True)
att1, se1 = self._fit(padded)

np.testing.assert_allclose(att1, att0, atol=1e-12)
np.testing.assert_allclose(se1, se0, atol=1e-12)

def test_subpopulation_zeroing_keeps_full_psu_structure(self):
"""Genuine subpopulation (zero a full PSU's weights) is NOT a naive
physical subset. The WLS fit is identical (zero-weight rows drop out of
X'WX), so the ATT is exactly invariant; but the TSL SE intentionally
differs from the dropped-rows SE because the zeroed PSU still counts in
the finite-sample correction (Lumley full-design convention).

The proposed "count only positive-weight PSUs" change would make the
zeroed SE equal the dropped SE; this assertion guards against it.
"""
base = self._panel()
victim = "s0_p0"

zeroed = base.copy()
zeroed.loc[zeroed["psu"] == victim, "w"] = 0.0
att_zero, se_zero = self._fit(zeroed)

dropped = base[base["psu"] != victim].copy()
att_drop, se_drop = self._fit(dropped)

# ATT is exactly invariant: zero-weight rows contribute nothing to the
# weighted normal equations, so the point estimate matches the subset.
np.testing.assert_allclose(att_zero, att_drop, atol=1e-10)

# SE is deliberately NOT invariant: the zeroed PSU stays in n_PSU_h and
# the centering (the Lumley domain gap, ~5e-3 rel here). A positive-
# weight-only correction would drive this to ~0.
rel_gap = abs(se_zero - se_drop) / se_drop
assert rel_gap > 1e-3, (
f"survey SE became invariant to subpopulation zeroing "
f"(rel_gap={rel_gap:.2e}); the full-design Lumley convention "
f"(TODO § Won't-fix / waived) appears to have been reverted."
)

def test_stratified_meat_counts_zero_score_psu_in_full_design(self):
"""Direct unit test on ``_compute_stratified_psu_meat``: an all-zero-score
PSU (a fully zeroed subpopulation PSU) stays in ``n_PSU_h``, the stratum
PSU-mean, and the centering. Asserts the **exact** full-design meat and
that it is NOT the positive-weight-only meat (which would drop the zero
PSU). This guards against partial edits the SE-level test could miss —
e.g. changing only the finite-sample denominator while still centering
over the zero PSU.
"""
# Two strata; stratum 0 has an all-zero PSU (psu 2 = two zeroed rows).
scores = np.array(
[
[1.5, -0.5], # stratum 0, psu 0
[0.5, 2.0], # stratum 0, psu 1
[0.0, 0.0], # stratum 0, psu 2 (all-zero subpop PSU), row 1
[0.0, 0.0], # stratum 0, psu 2, row 2
[-1.0, 0.7], # stratum 1, psu 3
[0.3, -1.2], # stratum 1, psu 4
]
)
strata = np.array([0, 0, 0, 0, 1, 1])
psu = np.array([0, 1, 2, 2, 3, 4])
# PSU 2's rows carry zero weight as well as zero score, so the fixture
# models a *true* fully zero-weight subpopulation PSU. The current meat
# ignores `weights` (it operates on scores), so this does not change the
# expected value — but it hardens the lock against a future denominator-
# only edit that reads `resolved.weights` to drop positive-weight PSUs.
resolved = ResolvedSurveyDesign(
weights=np.array([1.0, 1.0, 0.0, 0.0, 1.0, 1.0]),
weight_type="pweight",
strata=strata,
psu=psu,
fpc=None,
n_strata=2,
n_psu=5,
lonely_psu="remove",
)

meat, variance_computed, _ = _compute_stratified_psu_meat(scores, resolved)
assert variance_computed

def _stratum_meat(psu_scores):
# Full-design per-stratum meat with fpc=None: (n_h/(n_h-1)) * S'S.
n_h = psu_scores.shape[0]
centered = psu_scores - psu_scores.mean(axis=0, keepdims=True)
return (n_h / (n_h - 1)) * (centered.T @ centered)

# Full-design reference INCLUDES the all-zero PSU in stratum 0 (n_h=3).
s0_full = np.array([[1.5, -0.5], [0.5, 2.0], [0.0, 0.0]])
s1 = np.array([[-1.0, 0.7], [0.3, -1.2]])
expected_full = _stratum_meat(s0_full) + _stratum_meat(s1)
np.testing.assert_allclose(meat, expected_full, atol=1e-12)

# The positive-weight-only meat would DROP the zero PSU from stratum 0
# (n_h=2, mean over the two nonzero PSUs) — exactly the change the waived
# TODO proposed. The full-design meat must NOT equal it.
s0_pos = np.array([[1.5, -0.5], [0.5, 2.0]])
positive_only = _stratum_meat(s0_pos) + _stratum_meat(s1)
assert not np.allclose(meat, positive_only, atol=1e-8)


class TestRound8Fixes:
"""Tests for round-8 review fixes (PR #218)."""

Expand Down
Loading