Skip to content

fix(sector): guard zero-Fitch-words empty-vector UB (EW analogue of PR#264)#265

Merged
ms609 merged 1 commit into
cpp-searchfrom
claude/focused-neumann-ce84b4
Jul 4, 2026
Merged

fix(sector): guard zero-Fitch-words empty-vector UB (EW analogue of PR#264)#265
ms609 merged 1 commit into
cpp-searchfrom
claude/focused-neumann-ce84b4

Conversation

@ms609

@ms609 ms609 commented Jul 4, 2026

Copy link
Copy Markdown
Owner

Summary

PR #264 fixed wagner_tree/load_tip_states for the all-hierarchy HSJ/xform
case where a dataset collapses to 0 Fitch blocks (total_words == 0), leaving
tip_states/prelim/edge_set genuinely empty std::vectors. This PR fixes
the same undefined-behaviour class at 6 further sites, all reachable via plain
equal-weights data where every character is constant or an autapomorphy
(no HSJ/xform needed):

  • ts_sector.cpp: compute_from_above_for_sector (unconditional on every
    sector reduction — the most severe, reachable at default rasStarts=1,
    no non-default control needed), build_ras_sector (SearchControl(rasStarts>=2)),
    build_reduced_dataset_collapsed (sectorCollapseTarget>0)
  • ts_prune_reinsert.cpp: build_reduced_dataset, expand_and_reinsert
    (pruneReinsertCycles>0, auto-selected by the "large" preset at nTip>=120)
  • ts_temper.cpp: stochastic_tbr_phase (annealing, same "large"-preset reachability)
  • ts_tree.cpp: TreeState::save_node_state prealloc fast path (shared TBR/ratchet/temper primitive)
  • ts_wagner.cpp: wagner_goloboff_scores/wagner_entropy_scores (biased-Wagner tip scoring)

Each guarded with the same have_words ternary wagner_tree already uses
(never bare nullptr + 0 arithmetic, which UBSan's pointer-overflow check
would still flag).

Verification

  • Reachability of all 6 sites confirmed empirically: temporary REprintf
    probes on each total_words==0 branch fired under every new test before removal.
  • Confirmed the regression tests are not tautological: a negative control
    (temporarily reverting build_ras_sector's guard) reproduces the exact
    predicted abort (Assertion '__n < this->size()' failed) under a local
    -D_GLIBCXX_ASSERTIONS -O0 -g build, with a gdb backtrace confirming
    build_ras_sectorsearch_sectorxss_searchrun_single_replicate
    driven_searchts_driven_search — reachable from the public
    MaximizeParsimony() API. Restoring the guard reruns clean.
  • Full targeted suite (sector, wagner, hsj, xform, anneal, driven, drift,
    ratchet, tbr, Concordance, prune-reinsert) passes under the same hardened
    build with no behavioural regression.
  • Rebased onto current cpp-search tip (incl. #2228c6ae, #3d50dfd4); zero
    file overlap, clean rebase, rebuilt and retested post-rebase.

Test plan

  • Local -D_GLIBCXX_ASSERTIONS -O0 -g build — all touched test files green, no abort
  • Negative control confirms tests would catch a reverted guard
  • Full targeted local suite green post-rebase
  • GHA ASan — authoritative confirmation (triggers automatically on this PR)

Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com

…R#264)

Any dataset that collapses to 0 Fitch blocks (total_words == 0) -- not just
the all-hierarchy HSJ/xform case PR #264 fixed in wagner_tree/load_tip_states,
but also plain equal-weights data where every character is constant or an
autapomorphy -- leaves tip_states/prelim/edge_set genuinely empty
std::vectors. Code that unconditionally took the address of element 0 of
these empty vectors is undefined behaviour: it aborts under hardened
libstdc++ assertions/ASan but is silent (out-of-bounds address, never
dereferenced if the following block loop is also 0-trip) in a plain release
build.

Audited every zero-word-reachable site sharing this pattern and guarded each
with the same have_words ternary wagner_tree already uses (never bare
nullptr+0 arithmetic, which UBSan's pointer-overflow check would still flag):

- ts_sector.cpp: compute_from_above_for_sector (called unconditionally on
  every build_reduced_dataset(), i.e. every sector reduction including the
  default rasStarts=1 -- the most severe of these, reachable with no
  non-default control), build_ras_sector (the RAS-restart start-tree
  builder, reachable via SearchControl(rasStarts>=2)), and
  build_reduced_dataset_collapsed (reachable via sectorCollapseTarget>0).
- ts_prune_reinsert.cpp: build_reduced_dataset and expand_and_reinsert,
  reachable via pruneReinsertCycles>0 (auto-selected by the "large" preset
  at nTip>=120).
- ts_temper.cpp: stochastic_tbr_phase (annealing), same "large"-preset
  reachability, needed its own top-level total_words==0 early return
  matching nni_search/tbr_search/ratchet_search/drift_search's existing
  style.
- ts_tree.cpp: TreeState::save_node_state's prealloc fast path -- a shared
  primitive under TBR/ratchet/temper, guarded once at the source.
- ts_wagner.cpp: wagner_goloboff_scores/wagner_entropy_scores (biased-Wagner
  tip scoring), also reachable via the "large" preset.

Regression tests added to test-ts-sector.R (default control, rasStarts=3,
sectorCollapseTarget=6, pruneReinsertCycles=1, annealCycles=1, all through
MaximizeParsimony() on a 40-tip all-uninformative EW dataset) and
test-ts-wagner.R (ts_wagner_bias_bench directly). Reachability of all 6
sites was confirmed empirically (temporary REprintf probes on each
total_words==0 branch, fired under every new test, then removed).

Verified the tests are non-tautological, not just reachability-proxy: a
negative control reverting build_ras_sector's guard back to the unconditional
&rd.data.tip_states[...] reproduces the exact predicted abort
(Assertion '__n < this->size()' failed) under a local -D_GLIBCXX_ASSERTIONS
-O0 -g build, with a gdb backtrace confirming build_ras_sector (ts_sector.cpp)
-> search_sector -> xss_search -> run_single_replicate -> driven_search ->
ts_driven_search, i.e. reachable from the public MaximizeParsimony() API.
Restoring the guard reruns clean. Full targeted suite (sector, wagner, hsj,
xform, anneal, driven, drift, ratchet, tbr, Concordance) passes under the
same hardened build with no behavioural regression.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ms609 ms609 force-pushed the claude/focused-neumann-ce84b4 branch from df442df to ed59a51 Compare July 4, 2026 11:13
@ms609 ms609 merged commit 521f125 into cpp-search Jul 4, 2026
3 of 5 checks passed
@ms609 ms609 deleted the claude/focused-neumann-ce84b4 branch July 4, 2026 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant