Skip to content

MFT/USN-journal indexing for local NTFS roots (+ search/index fixes & perf)#10

Merged
denfry merged 30 commits into
masterfrom
feature/mft-usn-indexing
Jun 25, 2026
Merged

MFT/USN-journal indexing for local NTFS roots (+ search/index fixes & perf)#10
denfry merged 30 commits into
masterfrom
feature/mft-usn-indexing

Conversation

@denfry

@denfry denfry commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Summary

Adds WizFile/Everything-style indexing for local NTFS drives by reading the MFT directly and tracking changes via the USN journal, plus a batch of search/index fixes and speedups. The fast path is pure upside: it activates only when the process is elevated and the root is a local fixed NTFS volume, and any failure transparently falls back to the existing crawler — non-elevated / network / non-NTFS roots behave exactly as before.

Delivered in three parts (see docs/superpowers/specs and docs/superpowers/plans):

Search/index fixes & perf

  • Content search now scans the whole index filtered by size/date/type (not the name match), so it actually finds text inside files; made cancellable.
  • Search-box caret: placeholder aligned to the real text origin (border + padding).
  • Parallel crawl: work-queue directory enumeration using CrawlParallelism (hides SMB latency).
  • Dropped 4 unused SQLite indexes (search is in-memory) → ~1.5× faster bulk inserts; cheaper FileEntry construction (shared parent string) → ~5× faster; PRAGMA tuning; PLINQ filter above 20k entries.

Phase 1 — MFT bulk scan

New src/NetSearch.Core/Native/ layer: pure, unit-tested parsers (DataRunParser, MftRecordParser, PathBuilder, MftEntryAssembler, NtfsVolumeData) + Win32 interop (NativeMethods, NtfsVolume, MftEnumerator) + IndexStrategySelector. Wired into IndexManager/MainViewModel with per-root backend choice and crawler fallback.

Phase 2 — USN incremental

entries.frn column + per-root journal cursor, UsnRecordParser, UsnJournalData, UsnJournal interop, IndexManager.ApplyUsnDeltas (delete-then-reinsert by FRN), incremental-first refresh with full rescan on journal gap/rotation.

Testing

  • dotnet build: 0 warnings, 0 errors (Core net9.0, App net9.0-windows).
  • dotnet test: 83/83 passing. All byte-level parsing is unit-tested against synthetic records.
  • Live volume reads require Administrator + a real disk, so the interop layer is verified manually per docs/superpowers/manual-checks/ (mft-bulk-scan.md, usn-incremental.md). The transparent fallback keeps the app correct even if the native path has a latent issue.

Notes / non-blocking follow-ups

  • 4Kn (4096-byte-sector) volumes: fixups currently assume 512-byte sectors; such volumes safely fall back to the crawler (no fast path, no breakage).
  • ReadMftExtents intentionally throws on a corrupt MFT record 0 (→ crawler fallback) rather than returning empty extents, which would otherwise look like a zero-entry success.

🤖 Generated with Claude Code

denfry and others added 28 commits June 22, 2026 08:24
Adds the approved design for an alternative indexing backend that reads the
NTFS MFT directly (full size/date parity) and tracks changes via the USN
journal, with auto-detect of elevation + NTFS and transparent fallback to the
existing parallel crawler.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…xing

Two phased, TDD, bite-sized plans:
- Phase 1 (mft-bulk-scan): pure parsers (data-run, FILE record, path build,
  entry assembler, strategy selector) + Win32 volume interop + crawler fallback.
- Phase 2 (usn-incremental): frn/journal schema, USN parser, journal interop,
  delta application, incremental-first refresh with rescan-on-gap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lets callers build an entry from already-known parts (name + parent dir),
avoiding per-file path parsing. FromFileSystem now delegates to it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Crawler: work-queue parallel directory enumeration (uses CrawlParallelism,
  default raised to 8) with serialized onBatch; cheaper entry construction via
  shared parent string + FromComponents.
- IndexStore: drop 4 secondary indexes never used by any query (search is
  in-memory) — ~1.5x faster bulk inserts; add temp_store/mmap/cache pragmas.
- SearchEngine: PLINQ filter (AsParallel.AsOrdered) above 20k entries.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Content search now scans the whole index filtered by size/date/type (not the
  name match), so it actually finds text inside files; made cancellable.
- Placeholder hint aligned to the real caret origin (border + padding).
- RefreshAsync builds the crawler with the configured parallelism.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Pre-flight review caught a prefix-match bug in the Phase 1 entry assembler
(C:\Me would capture C:\MeToo); use an exact-or-separator-boundary check and
add a sibling-prefix test case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tests

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y parser

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The plan's example read volume geometry from the wrong offsets; align it with
the implemented NtfsVolumeData.Parse (BytesPerCluster@0x2C,
BytesPerFileRecordSegment@0x30, MftStartLcn@0x40).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…a tested parser

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The plan read the cursor from 0x08 (FirstUsn); NextUsn is at 0x10. Align with
the implemented pure UsnJournalData.Parse.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84c5dc0c14

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +69 to +71
var frns = changes.Select(c => c.Frn).Distinct().ToList();
// Old rows for every touched FRN go first (handles delete + rename-away cleanly).
_store.RemoveByFrn(rootId, frns);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reconcile subtree paths for directory USN changes

When the USN batch contains a directory rename or move, Windows reports the changed directory FRN but not every descendant, while this code deletes/reinserts only the directly touched FRNs. Because child rows store the old ParentPath, all files under that directory remain indexed at their previous paths until each child happens to change; directory moves out of the indexed root similarly leave stale descendants. Directory FRNs need a subtree update/rescan rather than only RemoveByFrn on the touched set.

Useful? React with 👍 / 👎.

Comment on lines +31 to +32
if (!vol.DeviceControl(NativeMethods.FSCTL_READ_USN_JOURNAL, input, outBuf, out var returned))
return (startUsn, Array.Empty<UsnChange>());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Treat failed USN reads as requiring a rescan

If the stored cursor has fallen behind the journal or FSCTL_READ_USN_JOURNAL fails transiently, this returns an empty change list with the original startUsn; the caller then applies no deltas and stores that cursor as if the refresh succeeded. In that scenario any deletes or renames in the missed range remain stale indefinitely, so a failed read should be surfaced as a journal mismatch/full-rescan condition rather than "no changes".

Useful? React with 👍 / 👎.

Comment on lines +197 to +199
catch (Exception) // MFT path failed → transparent fallback to the crawler
{
mgr.UpdateRoot(id, path, token, progress);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear USN cursors after crawler fallback

When the MFT path throws after a previous successful MFT scan, this fallback rewrites rows through the crawler, whose entries have Frn == null, but the root's stored usn_journal_id/usn_next is left intact. The next elevated refresh will take the incremental branch and RemoveByFrn cannot delete or move those null-FRN rows, leaving stale or duplicate paths for later deletes/renames; the fallback should clear the USN state or force the next MFT pass to do a full scan.

Useful? React with 👍 / 👎.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@denfry denfry merged commit 436b771 into master Jun 25, 2026
1 check passed
@denfry denfry deleted the feature/mft-usn-indexing branch June 25, 2026 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant