Complexity cap: bound pathological HermitCrab parses (follow-up to rustify) by johnml1135 · Pull Request #448 · sillsdev/machine

johnml1135 · 2026-07-02T20:24:51Z

Summary

Follow-up to #446 (rustify). PR #446 made the core HermitCrab engine much faster, but grammar-induced blowups remain: certain grammar constructs (unbounded/multiple-application rules with no overt exponent, unconstrained deletion, unconstrained compounding) still cause the analysis phase to generate candidates combinatorially, sometimes taking minutes to hours for a single word. This PR implements the three-layer mitigation designed in complexity-cap.md (Phases 0–3; Phase 4 is FieldWorks-repo follow-up, out of scope here):

Layer 1 — work budget (b3fd2b55): ParseContext propagated on Word exactly like CurrentTrace. Morpher.MaxParseSteps/ParseTimeout ship on with generous defaults. Every rule Apply() site checks the budget; breach is a soft-stop (partial results + ParseDiagnostics, never an exception). RerunWithDiagnostics re-parses one word with per-rule counters to report the top offending rule(s).
Layer 2 — structural bounds (e68f0984): MaxRuleApplicationsPerWord (closes the "rule A → B → A → B" loophole that a per-rule cap alone can't catch), MaxAnalysisShapeGrowth (prunes analysis candidates whose hypothesized underlying form grows past the surface form), and a cascade depth cap on PermutationRuleCascade. All default off (no behavior change for existing consumers).
Layer 3 — static grammar lint (c8a39aeb): GrammarAnalyzer with 8 stable diagnostic codes (HC0001–HC0008: no-overt-exponent affix rules, unbounded multipleApplication, self-feeding epenthesis/deletion, unconstrained compounding, optional-iterative lexical patterns, cyclic feeding pairs). Wired into the hc CLI as hc lint and a new hc parse --diagnose flag. Documented in docs/hermitcrab-grammar-performance.md.

Plus two small follow-up commits: doc bookkeeping (13567446) and a top-5-words-by-step-count diagnostic in the calibration test harness (343515b1).

Calibration caveats (see `complexity-cap.md` §4.1 and §10 items 7–8)

Real-corpus calibration against indonesian-hc.xml/sena-hc.xml showed legitimate per-word cost varies by ~1000x between grammars, which broke the original "large multiple of one grammar's ceiling" plan. Shipped defaults (DefaultMaxParseSteps = 50,000,000, DefaultParseTimeout = 30s) are instead set with headroom above the largest legitimate word observed so far across both grammars.
Sena calibration is based on only a ~1% sample (72/7,121 words) — a full-corpus re-baseline is in progress (see below) to confirm no legitimate word exceeds the shipped step default.
DefaultParseTimeout = 30s will still truncate some legitimate Sena words (one observed at 105s). This is flagged as a genuine product tradeoff needing field input, not something resolved unilaterally in this PR — feedback welcome.

Test plan

Full HermitCrab test suite passes (82/82), both projects build with zero warnings, csharpier clean
Determinism verified (same grammar+word → identical step count, single- and multi-threaded)
Verified against real pathological patterns (no-overt-exponent affix rule, real deletion-rule grammar)
Full Sena corpus (7,121 words) re-baseline running in background to close the ~1% sample caveat above — will report results as a PR comment

This change is

Adds ParseContext, a per-ParseWord work budget (MaxParseSteps + ParseTimeout, generous defaults shipped on) propagated through Word exactly like CurrentTrace. Every analysis/synthesis leaf rule Apply() checks it and returns Enumerable.Empty<Word>() on breach (soft-stop, never throws); orchestration-level loops (AnalysisStratumRule, AnalysisLanguageRule, Morpher.Synthesize/LexicalLookup) fast-unwind once exhausted. ParseWord gains a ParseDiagnostics overload reporting whether the budget was hit and why; RerunWithDiagnostics re-parses one word with per-rule counters to report the top offending rule. Confirmed against a synthetic "no overt exponent" pathological rule (HC0001-shaped: pure-copy Rhs with a high MaxApplicationCount) that previously ran unbounded past the cascades' own input==output loop guard. See complexity-cap.md for the full design (Layers 1-3).

Adds three additive, default-off caps that convert exponential blowups into bounded ones instead of merely time-boxing them: - Morpher.MaxRuleApplicationsPerWord: a running total-unapplications counter on Word (Word.TotalUnapplicationCount), checked alongside the existing per-rule MaxApplicationCount in the three affix/compounding analysis rules. Closes the "rule A -> B -> A -> B" loophole that a per-rule cap alone cannot catch. - Morpher.MaxAnalysisShapeGrowth: prunes analysis candidates whose shape has grown past the surface form by more than N segments, checked at AnalysisStratumRule's output loop (the choke point - candidates pruned there never reach lexical lookup) and per-iteration inside AnalysisRewriteRule's Deletion/SelfOpaquing reapplication loops. - PermutationRuleCascade.MaxDepth (SIL.Machine core, opt-in via a new property, -1/unlimited by default so existing consumers are unaffected): caps nested rule-reapplication depth, derived from MaxRuleApplicationsPerWord rather than a new knob, synced each Apply() call since the cap can be set via object-initializer syntax after the rule cascade is already compiled. Verified against RewriteRuleTests.DeletionRules' real deletion-rule grammar: capping MaxAnalysisShapeGrowth excludes the deep-reinsertion analysis while the shallow ones survive as a strict subset of the uncapped result.

…onesty pass Adds GrammarAnalyzer, a static analyzer over a loaded Language that flags always/almost-always-wrong rule shapes with stable diagnostic codes (HC0001-HC0008: no-overt-exponent affix rules, unbounded multipleApplication, self-feeding epenthesis/deletion rules, unconstrained compounding, optional-iterative lexical patterns, cyclic feeding pairs). Wired into the hc CLI as a new `hc lint` command, plus a `hc parse --diagnose` flag that surfaces RerunWithDiagnostics' top offending rules for a single word - the empirical companion to the static lint. Both are documented in a new docs/hermitcrab-grammar-performance.md guide organized by HC code. While shaping HC0004's self-feeding check, deduped the "does this rule's output unify with its own required environment" logic shared between AnalysisRewriteRule and GrammarAnalyzer into a single IsUnifiableWithEnvironment extension, and found/fixed a real gap: the lint only covered one of two engine paths that select self-opaquing behavior, silently missing the epenthesis case (unconditionally dangerous in Simultaneous mode). Also fixed a pre-existing HC0007 condition that required Optional *and* IsIterative on adjacent lexical pattern nodes, when the design doc's own canonical example (([Seg])([Seg])) is two plain-optional (non-iterative) groups - the check now matches the documented intent. Ran the real Phase 0 calibration corpus (indonesian/sena) against the rustify engine and replaced the Phase 1 doc comment's fabricated "~13,600 steps" figure with real numbers: Indonesian's worst word takes 10,445 steps (flat ~10-rule combinatorial interaction, not one bad rule); Sena's worst sampled word takes 14.9M steps/105s from only a ~1% corpus sample, and a separate real word was previously being truncated by the old 10s default timeout at 99,584 steps. Raised DefaultMaxParseSteps to 50,000,000 and DefaultParseTimeout to 30s accordingly, and documented in complexity-cap.md (with two new "still open" items) that the Sena figures are a floor pending a full-corpus re-baseline, and that the timeout is a genuine truncation/latency tradeoff rather than a pure safety margin. 82/82 HermitCrab tests pass; both projects build clean; csharpier clean. Co-Authored-By: Claude Sonnet 5 <noreply@anthropic.com>

Bookkeeping only - the status header and phase table still said "Plan (not started)" after Phases 0-3 were implemented and committed.

Small addition to the ad hoc Phase 0 calibration harness, left uncommitted from the corpus investigation: keeps a running top-5 (by StepsUsed) instead of only the single max, so a full-corpus re-baseline (see complexity-cap.md Section 10 item 7) shows the shape of the tail, not just one data point.

codecov-commenter · 2026-07-02T20:28:48Z

Codecov Report

❌ Patch coverage is 79.82646% with 93 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.73%. Comparing base (ea72cd7) to head (c1d7db6).

Files with missing lines	Patch %	Lines
...L.Machine.Morphology.HermitCrab/GrammarAnalyzer.cs	87.20%	29 Missing and 4 partials ⚠️
.../SIL.Machine.Morphology.HermitCrab/ParseContext.cs	74.35%	7 Missing and 3 partials ⚠️
...chine.Morphology.HermitCrab/AnalysisStratumRule.cs	60.00%	2 Missing and 4 partials ⚠️
src/SIL.Machine.Morphology.HermitCrab/Morpher.cs	91.07%	2 Missing and 3 partials ⚠️
...icalRules/AnalysisRealizationalAffixProcessRule.cs	37.50%	3 Missing and 2 partials ⚠️
...ermitCrab/PhonologicalRules/AnalysisRewriteRule.cs	54.54%	2 Missing and 3 partials ⚠️
...hine.Morphology.HermitCrab/HermitCrabExtensions.cs	69.23%	2 Missing and 2 partials ⚠️
src/SIL.Machine/Rules/PermutationRuleCascade.cs	40.00%	0 Missing and 3 partials ⚠️
...Morphology.HermitCrab/AnalysisAffixTemplateRule.cs	0.00%	1 Missing and 1 partial ⚠️
...hine.Morphology.HermitCrab/AnalysisLanguageRule.cs	0.00%	1 Missing and 1 partial ⚠️
... and 10 more

Additional details and impacted files

@@              Coverage Diff               @@
##           hc-rustify     #448      +/-   ##
==============================================
+ Coverage       73.65%   73.73%   +0.07%     
==============================================
  Files             444      447       +3     
  Lines           37929    38372     +443     
  Branches         5253     5323      +70     
==============================================
+ Hits            27937    28293     +356     
- Misses           8856     8911      +55     
- Partials         1136     1168      +32

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…mization findings A separate investigation (sharded Release-mode full-corpus scan, see docs/hermitcrab-parse-algorithm-analysis.md on the sibling parse-optimization branch, not yet committed anywhere) got much further than this branch's own single-threaded Debug-mode recalibration attempt, which was aborted after ~1 hour at 283/7,121 Sena words to avoid burning many more hours on redundant/inferior data. Updates items 7-8 with that scan's numbers (p90 ~2M steps, ~16% of words >1M steps, worst observed >=39.9M steps, 30s ParseTimeout trips on dozens of legitimate words) and adds item 9: cinacemerwa (37.5M steps, 0 valid parses) crashed the NUnit test host outright, apparently from memory pressure independent of the step/timeout budgets - the current Layer 1/2 budgets bound steps and wall-clock but not allocations.

johnml1135 · 2026-07-02T21:30:29Z

Sena full-corpus calibration update

My own attempt to re-run the full 7,121-word Sena corpus (single-threaded, Debug build, via the Sena_Baseline_NoWordExhaustsUnlimitedBudget/Sena_ShippedDefaults_NeverTrip tests) was too slow to be practical — some individual words alone took 50+ seconds, and it was aborted after ~1 hour at 283/7,121 words rather than run for what looked like 20+ hours.

A separate, much more efficient investigation (sharded 8-way, Release-mode instrumented harness — see docs/hermitcrab-parse-algorithm-analysis.md, currently on a sibling parse-optimization branch and not yet committed anywhere) got further and found:

p90 ≈ 2,000,000 steps; ~16% of Sena words exceed 1,000,000 steps
Worst observed so far ≥ 39,900,000 steps (kukucitirani) — under the shipped 50,000,000-step DefaultMaxParseSteps, but with less headroom than the earlier ~1% sample suggested, and not yet a confirmed corpus-wide max
DefaultParseTimeout = 30s trips on dozens of legitimate Sena words (100–250s single-threaded), not just the one word noted originally
New finding: cinacemerwa (37.5M steps, 0 valid parses) crashed the NUnit test host outright, apparently from memory pressure — independent of the step/timeout budgets, which bound steps and wall-clock but not allocations

Pushed as c1d7db64, updating complexity-cap.md §10 items 7–8 and adding item 9 (the OOM-crash finding). All still open — full corpus re-baseline and the timeout product decision remain unresolved, and it's now an open question whether budget #3 (allocation/candidate-count ceiling) is needed or whether the parse-optimization branch's in-progress algorithmic fixes (nogood cache / memoization, directly targeting the same redundant-search root cause) make it moot.

johnml1135 and others added 5 commits July 2, 2026 14:16

complexity-cap.md: mark Phases 0-3 done, record commit hashes

1356744

Bookkeeping only - the status header and phase table still said "Plan (not started)" after Phases 0-3 were implemented and committed.

johnml1135 marked this pull request as draft July 2, 2026 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Complexity cap: bound pathological HermitCrab parses (follow-up to rustify)#448

Complexity cap: bound pathological HermitCrab parses (follow-up to rustify)#448
johnml1135 wants to merge 6 commits into
hc-rustifyfrom
complexity-cap

johnml1135 commented Jul 2, 2026 •

edited by ddaspit

Loading

Uh oh!

codecov-commenter commented Jul 2, 2026 •

edited

Loading

Uh oh!

johnml1135 commented Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

johnml1135 commented Jul 2, 2026 • edited by ddaspit Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Calibration caveats (see complexity-cap.md §4.1 and §10 items 7–8)

Test plan

Uh oh!

codecov-commenter commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

johnml1135 commented Jul 2, 2026

Sena full-corpus calibration update

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johnml1135 commented Jul 2, 2026 •

edited by ddaspit

Loading

Calibration caveats (see `complexity-cap.md` §4.1 and §10 items 7–8)

codecov-commenter commented Jul 2, 2026 •

edited

Loading