Skip to content

Complexity cap: bound pathological HermitCrab parses (follow-up to rustify)#448

Draft
johnml1135 wants to merge 6 commits into
hc-rustifyfrom
complexity-cap
Draft

Complexity cap: bound pathological HermitCrab parses (follow-up to rustify)#448
johnml1135 wants to merge 6 commits into
hc-rustifyfrom
complexity-cap

Conversation

@johnml1135

@johnml1135 johnml1135 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Follow-up to #446 (rustify). PR #446 made the core HermitCrab engine much faster, but grammar-induced blowups remain: certain grammar constructs (unbounded/multiple-application rules with no overt exponent, unconstrained deletion, unconstrained compounding) still cause the analysis phase to generate candidates combinatorially, sometimes taking minutes to hours for a single word. This PR implements the three-layer mitigation designed in complexity-cap.md (Phases 0–3; Phase 4 is FieldWorks-repo follow-up, out of scope here):

  • Layer 1 — work budget (b3fd2b55): ParseContext propagated on Word exactly like CurrentTrace. Morpher.MaxParseSteps/ParseTimeout ship on with generous defaults. Every rule Apply() site checks the budget; breach is a soft-stop (partial results + ParseDiagnostics, never an exception). RerunWithDiagnostics re-parses one word with per-rule counters to report the top offending rule(s).
  • Layer 2 — structural bounds (e68f0984): MaxRuleApplicationsPerWord (closes the "rule A → B → A → B" loophole that a per-rule cap alone can't catch), MaxAnalysisShapeGrowth (prunes analysis candidates whose hypothesized underlying form grows past the surface form), and a cascade depth cap on PermutationRuleCascade. All default off (no behavior change for existing consumers).
  • Layer 3 — static grammar lint (c8a39aeb): GrammarAnalyzer with 8 stable diagnostic codes (HC0001–HC0008: no-overt-exponent affix rules, unbounded multipleApplication, self-feeding epenthesis/deletion, unconstrained compounding, optional-iterative lexical patterns, cyclic feeding pairs). Wired into the hc CLI as hc lint and a new hc parse --diagnose flag. Documented in docs/hermitcrab-grammar-performance.md.

Plus two small follow-up commits: doc bookkeeping (13567446) and a top-5-words-by-step-count diagnostic in the calibration test harness (343515b1).

Calibration caveats (see complexity-cap.md §4.1 and §10 items 7–8)

  • Real-corpus calibration against indonesian-hc.xml/sena-hc.xml showed legitimate per-word cost varies by ~1000x between grammars, which broke the original "large multiple of one grammar's ceiling" plan. Shipped defaults (DefaultMaxParseSteps = 50,000,000, DefaultParseTimeout = 30s) are instead set with headroom above the largest legitimate word observed so far across both grammars.
  • Sena calibration is based on only a ~1% sample (72/7,121 words) — a full-corpus re-baseline is in progress (see below) to confirm no legitimate word exceeds the shipped step default.
  • DefaultParseTimeout = 30s will still truncate some legitimate Sena words (one observed at 105s). This is flagged as a genuine product tradeoff needing field input, not something resolved unilaterally in this PR — feedback welcome.

Test plan

  • Full HermitCrab test suite passes (82/82), both projects build with zero warnings, csharpier clean
  • Determinism verified (same grammar+word → identical step count, single- and multi-threaded)
  • Verified against real pathological patterns (no-overt-exponent affix rule, real deletion-rule grammar)
  • Full Sena corpus (7,121 words) re-baseline running in background to close the ~1% sample caveat above — will report results as a PR comment

This change is Reviewable

johnml1135 and others added 5 commits July 2, 2026 14:16
Adds ParseContext, a per-ParseWord work budget (MaxParseSteps + ParseTimeout,
generous defaults shipped on) propagated through Word exactly like
CurrentTrace. Every analysis/synthesis leaf rule Apply() checks it and
returns Enumerable.Empty<Word>() on breach (soft-stop, never throws);
orchestration-level loops (AnalysisStratumRule, AnalysisLanguageRule,
Morpher.Synthesize/LexicalLookup) fast-unwind once exhausted.

ParseWord gains a ParseDiagnostics overload reporting whether the budget
was hit and why; RerunWithDiagnostics re-parses one word with per-rule
counters to report the top offending rule. Confirmed against a synthetic
"no overt exponent" pathological rule (HC0001-shaped: pure-copy Rhs with a
high MaxApplicationCount) that previously ran unbounded past the cascades'
own input==output loop guard.

See complexity-cap.md for the full design (Layers 1-3).
Adds three additive, default-off caps that convert exponential blowups
into bounded ones instead of merely time-boxing them:

- Morpher.MaxRuleApplicationsPerWord: a running total-unapplications
  counter on Word (Word.TotalUnapplicationCount), checked alongside the
  existing per-rule MaxApplicationCount in the three affix/compounding
  analysis rules. Closes the "rule A -> B -> A -> B" loophole that a
  per-rule cap alone cannot catch.
- Morpher.MaxAnalysisShapeGrowth: prunes analysis candidates whose shape
  has grown past the surface form by more than N segments, checked at
  AnalysisStratumRule's output loop (the choke point - candidates pruned
  there never reach lexical lookup) and per-iteration inside
  AnalysisRewriteRule's Deletion/SelfOpaquing reapplication loops.
- PermutationRuleCascade.MaxDepth (SIL.Machine core, opt-in via a new
  property, -1/unlimited by default so existing consumers are
  unaffected): caps nested rule-reapplication depth, derived from
  MaxRuleApplicationsPerWord rather than a new knob, synced each Apply()
  call since the cap can be set via object-initializer syntax after the
  rule cascade is already compiled.

Verified against RewriteRuleTests.DeletionRules' real deletion-rule
grammar: capping MaxAnalysisShapeGrowth excludes the deep-reinsertion
analysis while the shallow ones survive as a strict subset of the
uncapped result.
…onesty pass

Adds GrammarAnalyzer, a static analyzer over a loaded Language that flags
always/almost-always-wrong rule shapes with stable diagnostic codes
(HC0001-HC0008: no-overt-exponent affix rules, unbounded
multipleApplication, self-feeding epenthesis/deletion rules,
unconstrained compounding, optional-iterative lexical patterns, cyclic
feeding pairs). Wired into the hc CLI as a new `hc lint` command, plus a
`hc parse --diagnose` flag that surfaces RerunWithDiagnostics' top
offending rules for a single word - the empirical companion to the
static lint. Both are documented in a new
docs/hermitcrab-grammar-performance.md guide organized by HC code.

While shaping HC0004's self-feeding check, deduped the "does this rule's
output unify with its own required environment" logic shared between
AnalysisRewriteRule and GrammarAnalyzer into a single
IsUnifiableWithEnvironment extension, and found/fixed a real gap: the
lint only covered one of two engine paths that select self-opaquing
behavior, silently missing the epenthesis case (unconditionally
dangerous in Simultaneous mode). Also fixed a pre-existing HC0007
condition that required Optional *and* IsIterative on adjacent lexical
pattern nodes, when the design doc's own canonical example
(([Seg])([Seg])) is two plain-optional (non-iterative) groups - the
check now matches the documented intent.

Ran the real Phase 0 calibration corpus (indonesian/sena) against the
rustify engine and replaced the Phase 1 doc comment's fabricated
"~13,600 steps" figure with real numbers: Indonesian's worst word takes
10,445 steps (flat ~10-rule combinatorial interaction, not one bad
rule); Sena's worst sampled word takes 14.9M steps/105s from only a ~1%
corpus sample, and a separate real word was previously being truncated
by the old 10s default timeout at 99,584 steps. Raised
DefaultMaxParseSteps to 50,000,000 and DefaultParseTimeout to 30s
accordingly, and documented in complexity-cap.md (with two new "still
open" items) that the Sena figures are a floor pending a full-corpus
re-baseline, and that the timeout is a genuine truncation/latency
tradeoff rather than a pure safety margin.

82/82 HermitCrab tests pass; both projects build clean; csharpier clean.

Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>
Bookkeeping only - the status header and phase table still said "Plan
(not started)" after Phases 0-3 were implemented and committed.
Small addition to the ad hoc Phase 0 calibration harness, left uncommitted
from the corpus investigation: keeps a running top-5 (by StepsUsed) instead
of only the single max, so a full-corpus re-baseline (see complexity-cap.md
Section 10 item 7) shows the shape of the tail, not just one data point.
@codecov-commenter

codecov-commenter commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 79.82646% with 93 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.73%. Comparing base (ea72cd7) to head (c1d7db6).

Files with missing lines Patch % Lines
...L.Machine.Morphology.HermitCrab/GrammarAnalyzer.cs 87.20% 29 Missing and 4 partials ⚠️
.../SIL.Machine.Morphology.HermitCrab/ParseContext.cs 74.35% 7 Missing and 3 partials ⚠️
...chine.Morphology.HermitCrab/AnalysisStratumRule.cs 60.00% 2 Missing and 4 partials ⚠️
src/SIL.Machine.Morphology.HermitCrab/Morpher.cs 91.07% 2 Missing and 3 partials ⚠️
...icalRules/AnalysisRealizationalAffixProcessRule.cs 37.50% 3 Missing and 2 partials ⚠️
...ermitCrab/PhonologicalRules/AnalysisRewriteRule.cs 54.54% 2 Missing and 3 partials ⚠️
...hine.Morphology.HermitCrab/HermitCrabExtensions.cs 69.23% 2 Missing and 2 partials ⚠️
src/SIL.Machine/Rules/PermutationRuleCascade.cs 40.00% 0 Missing and 3 partials ⚠️
...Morphology.HermitCrab/AnalysisAffixTemplateRule.cs 0.00% 1 Missing and 1 partial ⚠️
...hine.Morphology.HermitCrab/AnalysisLanguageRule.cs 0.00% 1 Missing and 1 partial ⚠️
... and 10 more
Additional details and impacted files
@@              Coverage Diff               @@
##           hc-rustify     #448      +/-   ##
==============================================
+ Coverage       73.65%   73.73%   +0.07%     
==============================================
  Files             444      447       +3     
  Lines           37929    38372     +443     
  Branches         5253     5323      +70     
==============================================
+ Hits            27937    28293     +356     
- Misses           8856     8911      +55     
- Partials         1136     1168      +32     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@johnml1135 johnml1135 marked this pull request as draft July 2, 2026 20:58
…mization findings

A separate investigation (sharded Release-mode full-corpus scan, see
docs/hermitcrab-parse-algorithm-analysis.md on the sibling parse-optimization
branch, not yet committed anywhere) got much further than this branch's own
single-threaded Debug-mode recalibration attempt, which was aborted after
~1 hour at 283/7,121 Sena words to avoid burning many more hours on
redundant/inferior data.

Updates items 7-8 with that scan's numbers (p90 ~2M steps, ~16% of words
>1M steps, worst observed >=39.9M steps, 30s ParseTimeout trips on dozens
of legitimate words) and adds item 9: cinacemerwa (37.5M steps, 0 valid
parses) crashed the NUnit test host outright, apparently from memory
pressure independent of the step/timeout budgets - the current Layer 1/2
budgets bound steps and wall-clock but not allocations.
@johnml1135

Copy link
Copy Markdown
Collaborator Author

Sena full-corpus calibration update

My own attempt to re-run the full 7,121-word Sena corpus (single-threaded, Debug build, via the Sena_Baseline_NoWordExhaustsUnlimitedBudget/Sena_ShippedDefaults_NeverTrip tests) was too slow to be practical — some individual words alone took 50+ seconds, and it was aborted after ~1 hour at 283/7,121 words rather than run for what looked like 20+ hours.

A separate, much more efficient investigation (sharded 8-way, Release-mode instrumented harness — see docs/hermitcrab-parse-algorithm-analysis.md, currently on a sibling parse-optimization branch and not yet committed anywhere) got further and found:

  • p90 ≈ 2,000,000 steps; ~16% of Sena words exceed 1,000,000 steps
  • Worst observed so far ≥ 39,900,000 steps (kukucitirani) — under the shipped 50,000,000-step DefaultMaxParseSteps, but with less headroom than the earlier ~1% sample suggested, and not yet a confirmed corpus-wide max
  • DefaultParseTimeout = 30s trips on dozens of legitimate Sena words (100–250s single-threaded), not just the one word noted originally
  • New finding: cinacemerwa (37.5M steps, 0 valid parses) crashed the NUnit test host outright, apparently from memory pressure — independent of the step/timeout budgets, which bound steps and wall-clock but not allocations

Pushed as c1d7db64, updating complexity-cap.md §10 items 7–8 and adding item 9 (the OOM-crash finding). All still open — full corpus re-baseline and the timeout product decision remain unresolved, and it's now an open question whether budget #3 (allocation/candidate-count ceiling) is needed or whether the parse-optimization branch's in-progress algorithmic fixes (nogood cache / memoization, directly targeting the same redundant-search root cause) make it moot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants