[experiment] On-device decompressed AssemblyStore cache (CoreCLR)#11967
Draft
simonrozsival wants to merge 4 commits into
Draft
[experiment] On-device decompressed AssemblyStore cache (CoreCLR)#11967simonrozsival wants to merge 4 commits into
simonrozsival wants to merge 4 commits into
Conversation
Prototype exploring caching decompressed assemblies on-device so that subsequent launches skip zstd decompression and load the data via a file-backed mmap instead of dirty anonymous memory. - assembly-store.cc: on a decompression cache miss, a single background thread atomically writes the decompressed bytes to <codeCacheDir>/decompressed-assembly-cache/<Assembly.dll> (temp -> fsync -> rename). On the next launch the file is mmap'd (MAP_PRIVATE, COW) and decompression is skipped. Per-assembly, only assemblies actually touched are cached. Staleness guarded by an 8-byte footer holding an xxhash of the compressed payload. - Plumb codeCacheDir (Context.getCodeCacheDir()) through Java initInternal -> appDirs[3] -> AndroidSystem, so a stale cache is auto-wiped by Android on app/platform update. - Runtime A/B toggle via `debug.net.asmcache` system property (and XA_DISABLE_ASSEMBLY_CACHE env var). Experimental only: no MSBuild opt-in, no assembly-store version stamp, CoreCLR only. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7cac3de to
9d831a5
Compare
The background writer thread read directly from the shared uncompressed_assemblies_data_buffer, but on a cache miss that same buffer is handed to the runtime once the decompress lock is released, and the runtime may write into the assembly image (the reason the cache-hit path maps the file MAP_PRIVATE / COW). Concurrent writes could persist a torn or post-mutation image; since the staleness footer only hashes the *compressed* payload, that corrupt image would then be reloaded from cache as if pristine on the next launch. Take a private snapshot of the decompressed bytes in enqueue_write, while the caller still holds assembly_decompress_mutex and before the buffer is exposed to the runtime, so the writer only ever touches immutable memory it owns. On allocation failure we skip caching that assembly rather than aborting. Trade-off: this adds one memcpy per newly-cached assembly on the first-launch (cache-miss) path and holds the queued snapshots (up to the touched working set) transiently until the writer drains them. Subsequent launches hit the mmap path and never enqueue. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9d831a5 to
f1de798
Compare
Tidy up the file-writing path without pulling <fstream>/iostreams into the runtime .so (only the build-time pinvoke-table generator uses those; the runtime deliberately sticks to raw syscalls to keep the library small and startup cheap). - Lay out the full on-disk image ([payload][8-byte token footer]) in the snapshot buffer at enqueue time, so the writer emits it in a single contiguous write. This drops the separate footer write (and with it a bug: that write didn't handle EINTR/partial writes) and lets WriteRequest lose its token field. - Extract a write_fully() helper for the EINTR/partial-write retry loop, leaving writer_loop as open -> write_fully -> fsync -> close -> rename. No behavior change: the cache file format is identical, so existing cache files remain valid. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move the open/write/fsync/close/rename ceremony into a dedicated write_cache_file() method so writer_loop() only owns the concurrency concerns (waiting on the queue, dequeuing under the lock) and delegates the actual persistence. No behavior change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Warning
Experimental prototype — not ready to merge. No MSBuild opt-in, no assembly-store version stamp, CoreCLR only. Opening as a draft to gather feedback and run CI.
Builds on top of the merged Zstd AssemblyStore compression (#11730). This prototype explores caching decompressed assemblies on-device so subsequent launches skip Zstd decompression and load assemblies via a file-backed
mmap(clean, shareable pages) instead of dirty anonymous memory.What it does
On-device decompression cache (
src/native/clr/host/assembly-store.cc)<codeCacheDir>/decompressed-assembly-cache/<Assembly.dll>(temp →fsync→rename).mmap-ed (MAP_PRIVATE, COW) and decompression is skipped.codeCacheDir(Context.getCodeCacheDir()) through JavainitInternal→appDirs[3]→AndroidSystem, so Android auto-wipes a stale cache on app/platform update.debug.net.asmcachesystem property (andXA_DISABLE_ASSEMBLY_CACHEenv var).Max Zstd compression level (22)
Benchmarks (cache on/off)
Measured on #11730 while prototyping on this branch.
MauiBench(dotnet new maui, blank +--sample-content), Release / CoreCLR / default partial-R2R + PGO /android-arm64, built at max Zstd level (22) withextractNativeLibs=false(what Google Play ships for.aabon API 26+). Device: Samsung Galaxy A16 (SM-A165F), Android 16 (API 36). Settled harness:am start -S -WTotalTime, force-stop + 10 s between every launch, order-balanced interleaving, n=20/cell. Cache toggled viaadb shell setprop debug.net.asmcache 0|1.Warm-start latency — cache OFF vs ON (WARM is the trustworthy metric; see caveats):
mauiblankmaui --sample-contentThe cache's warm benefit scales with store size (bigger store → more decompression avoided per launch). An earlier prototype run (Zstd L3,
extractNativeLibs=true) showed the same direction: sample-content warm mean 2264 ms (OFF) → 2226 ms (ON), −38 ms (t = −3.3, sig), vs 2204 ms for LZ4 onmain— i.e. the cache recovers2/3 of Zstd's decompression penalty vs LZ4 but does not quite reach LZ4 parity (+22 ms).Store / download size is unaffected by the cache (it only trades on-device disk for warm-start CPU). Level 22 is the actionable size lever, independent of the cache (sample-content app, arm64):
main).soWith
extractNativeLibs=falsethe store is Stored (uncompressed) in the APK, so the APK delta ≈ the store delta: L22 takes ~2.7 MB off the sample app's download vs LZ4 (~1.1 MB vs Zstd L3).Caveats / honest read
pm clearbefore each launch), so ON can only do more work; the observed cold swings (both directions across apps in the same session) are a CPU-governor artifact of the background writer thread, SoC-specific and not portable.mmappages is likely the stronger justification than latency; that measurement is still outstanding.Full write-ups and raw numbers: #11730 (comment) and #11730 (comment).
Notes
mainafter the Zstd compression work merged; theMicrosoft.Android.Build.Tasks/CompressAssembliesrefactor is now part ofmainand no longer appears here.