Add stateful-history delta work items to the workflow worker#1777
Draft
JoshVanL wants to merge 1 commit into
Draft
Add stateful-history delta work items to the workflow worker#1777JoshVanL wants to merge 1 commit into
JoshVanL wants to merge 1 commit into
Conversation
The sidecar re-sends a workflow instance's entire committed history to the worker on every turn. This adds the worker half of the "stateful history" optimization so that, once a worker is warm for an instance on a work-item stream, the sidecar sends only the new committed events (the delta) and the worker reconstructs the full history from its own cache. It mirrors the Go (durabletask-go), Python, and .NET SDK implementations and is on by default. Worker (durabletask-client): - WorkflowHistoryCache: a per-stream cache of each instance's committed history, bounded by a sliding TTL, an instance-count cap, and a byte budget with LRU eviction. Injectable clock for deterministic tests. - DurableTaskGrpcWorker: advertise WORKER_CAPABILITY_STATEFUL_HISTORY in GetWorkItemsRequest, reset the cache on every reconnect (the sidecar drops the old stream's warm set), and reclaim idle entries with a daemon janitor stopped on close. - OrchestratorRunner: before replay, resolve the full committed history (cached prefix + delta on a hit, or a GetInstanceHistory fetch on a miss) instead of using the request's pastEvents directly; after replay, cache the committed history, or drop it once the instance ends (a CompleteWorkflow action, covering completed/failed/terminated/continued-as-new). A TerminateWorkflow action targets a different instance and is deliberately not treated as a reset. Correctness never depends on the cache: any miss (cold stream, eviction, desync) self-heals via the GetInstanceHistory fallback, so this only changes per-turn bandwidth, not results. A fallback fetch that fails abandons the work item for backend redelivery rather than completing with a partial history. Configuration (DurableTaskGrpcWorkerBuilder): - disableStatefulHistory to opt out, plus historyCacheTtl, historyCacheMaxInstances, and historyCacheMaxBytes to tune the bounds. Signed-off-by: joshvanl <me@joshvanl.dev>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1777 +/- ##
=========================================
Coverage 76.89% 76.89%
Complexity 2307 2307
=========================================
Files 244 244
Lines 7163 7163
Branches 753 753
=========================================
Hits 5508 5508
Misses 1288 1288
Partials 367 367 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The sidecar re-sends a workflow instance's entire committed history to the worker on every turn. This adds the worker half of the "stateful history" optimization so that, once a worker is warm for an instance on a work-item stream, the sidecar sends only the new committed events (the delta) and the worker reconstructs the full history from its own cache. It mirrors the Go (durabletask-go), Python, and .NET SDK implementations and is on by default.
Worker (durabletask-client):
Correctness never depends on the cache: any miss (cold stream, eviction, desync) self-heals via the GetInstanceHistory fallback, so this only changes per-turn bandwidth, not results. A fallback fetch that fails abandons the work item for backend redelivery rather than completing with a partial history.
Configuration (DurableTaskGrpcWorkerBuilder):