Partition PendingContext into operation state and actual pending state#1880
Open
TedHartMS wants to merge 11 commits into
Open
Partition PendingContext into operation state and actual pending state#1880TedHartMS wants to merge 11 commits into
TedHartMS wants to merge 11 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR refactors Tsavorite’s pending-IO plumbing by splitting the former PendingContext into (1) a small, stack-carried hot-path OperationState and (2) a pending-only PendingState stored on the pooled PendingIoContext. It also updates arithmetic in-place update logic, record-size bookkeeping, and related docs/benchmarks to improve correctness/perf on hot code paths.
Changes:
- Partition pending-operation data into
OperationState(hot-path) +PendingState(pending-only) and update the full pending/complete/re-pend flow accordingly. - Optimize in-place numeric updates (overflow detection) and tighten record-size/optionals consistency checks during in-place updates.
- Update Tsavorite docs and Benchmarks (including adding a larger-than-memory benchmark variant + a benchmark configuration hook).
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| website/docs/dev/tsavorite/reviv.md | Docs update: retry reuse now refers to OperationState rather than the old pending struct. |
| website/docs/dev/tsavorite/logrecord.md | Docs update for PendingState/pending IO terminology and behavior. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/TsavoriteThread.cs | Updates pending completion to use baseOperationState + pendingState and adjusted transfer/disposal semantics. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Tsavorite.cs | Replaces stack-allocated PendingContext with stack-allocated OperationState across sync read/upsert/rmw/delete entrypoints. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/TryCopyToTail.cs | Updates copy-to-tail path to accept/record OperationState fields. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/TryCopyToReadCache.cs | Updates read-cache copy path to use OperationState. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/InternalUpsert.cs | Refactors Upsert hot path to OperationState and updates retry-reuse plumbing. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/InternalRMW.cs | Refactors RMW hot + pending path initialization to populate PendingState via OperationState.pendingOp. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/InternalRead.cs | Refactors Read hot + pending path initialization; adds explicit isFromPending plumbing for ReadInfo. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/InternalDelete.cs | Refactors Delete hot path to OperationState. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/HandleOperationStatus.cs | Centralizes pending op finalization around operationState.pendingOp; assigns IO id on the AsyncIOContext and handles diskLogRecord reuse/cleanup. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/FindRecord.cs | Updates helper signatures/comments to use OperationState for pending-aware searches. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/EpochOperations.cs | Updates SynchronizeEpoch to accept OperationState. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/ContinuePending.cs | Refactors all ContinuePending* methods to operate on OperationState + PendingState; updates re-pend flow. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/ConditionalCopyToTail.cs | Updates conditional copy-to-tail + IO prep to store conditional payload on PendingState and hot-path fields on OperationState. |
| libs/storage/Tsavorite/cs/src/core/Index/Tsavorite/Implementation/BlockAllocate.cs | Moves flush-event + retry allocation tracking from old pending struct onto OperationState. |
| libs/storage/Tsavorite/cs/src/core/Index/Common/PendingState.cs | Renames/refactors the former PendingContext to a pending-only PendingState payload (key/input/output/diskLogRecord, etc.). |
| libs/storage/Tsavorite/cs/src/core/Index/Common/PendingIoContext.cs | Pending IO context now carries baseOperationState + pendingState instead of a single pending struct. |
| libs/storage/Tsavorite/cs/src/core/Index/Common/OperationStatus.cs | Updates docs to reference OperationState in retry-handling remarks. |
| libs/storage/Tsavorite/cs/src/core/Index/Common/OperationState.cs | New: defines the hot-path per-operation state that is carried on the caller stack and snapshotted into pending ops. |
| libs/storage/Tsavorite/cs/src/core/Index/Common/ExecutionContext.cs | Pool reset now clears both baseOperationState and pendingState for safety. |
| libs/storage/Tsavorite/cs/src/core/Index/Common/CompletedOutput.cs | Completed-output transfer now moves key/input/output from PendingState and metadata from OperationState. |
| libs/storage/Tsavorite/cs/src/core/Allocator/LogRecord.cs | Doc update: record reuse mentions OperationState. |
| libs/storage/Tsavorite/cs/src/core/Allocator/ConditionallyHoistedKey.cs | Doc update: references OperationState as the containing hot-path struct for sizing rationale. |
| libs/storage/Tsavorite/cs/src/core/Allocator/AsyncIOContext.cs | Doc update: key shallow-copy comment references operationState. |
| libs/storage/Tsavorite/cs/src/core/Allocator/AllocatorScan.cs | Refactors conditional scan push to use OperationState + explicit keyHash flow and pendingState scan payload. |
| libs/storage/Tsavorite/cs/benchmark/KV.benchmark/README.md | Updates benchmark docs to refer to OperationState instead of PendingContext. |
| libs/server/Storage/Functions/MainStore/RMWMethods.cs | Refactors record size tracking to declare RecordSizeInfo per-branch and assert optionals consistency where applicable. |
| libs/server/Storage/Functions/MainStore/PrivateMethods.cs | Replaces checked+try/catch overflow handling with branchless signed overflow detection in TryInPlaceUpdateNumber. |
| benchmark/BDN.benchmark/Operations/RawStringOperations.cs | Minor benchmark method-body simplifications. |
| benchmark/BDN.benchmark/Operations/OperationsBase.cs | Adds ConfigureServerOptions hook and validates LocalMemory device config for IO-path benchmarking. |
| benchmark/BDN.benchmark/Operations/LTM/RawStringOperations.cs | New larger-than-memory RawString benchmark variant using LocalMemory device tiering + randomized fixed-width keys. |
| benchmark/BDN.benchmark/Network/RawStringOperations.cs | Minor benchmark method-body simplifications (async). |
Comments suppressed due to low confidence (1)
libs/storage/Tsavorite/cs/src/core/Index/Common/PendingState.cs:157
- Typo in XML doc comment: "pending pendingState" has a duplicated word.
Comment on lines
78
to
82
| // now below HeadAddress and there is at least one record below HeadAddress but above InitialLatestLogicalAddress. Reissue the Read(), | ||
| // using the LogicalAddress we just found as minAddress. We will either find an in-memory version of the key that was added after the | ||
| // TryFindRecordInMemory we just did, or do IO and find the record we just found or one above it. Read() updates InitialLatestLogicalAddress, | ||
| // so if we do IO, the next time we come to CompletePendingRead we will only search for a newer version of the key in any records added | ||
| // after our just-completed TryFindRecordInMemory. |
Comment on lines
+115
to
120
| if (internalStatus != OperationStatus.RECORD_ON_DISK) | ||
| { | ||
| pendingState = newOp.pendingState; | ||
| newOp.pendingState = default; | ||
| sessionFunctions.Ctx.ReturnAsyncIOContext(newOp); | ||
| } |
Comment on lines
+317
to
+322
| if (status != OperationStatus.RECORD_ON_DISK) | ||
| { | ||
| pendingState = newOp.pendingState; | ||
| newOp.pendingState = default; | ||
| sessionFunctions.Ctx.ReturnAsyncIOContext(newOp); | ||
| } |
| SkipReadCache(ref stackCtx, out _); // Where this is called, we have no dependency on source addresses so we don't care if it Refreshed | ||
|
|
||
| // We don't have a pendingContext here, so pass the minAddress directly. | ||
| // We don't have a operationState here, so pass the minAddress directly. |
| `PendingState` carries information through the IO process and provides the source record (via its `DiskLogRecord`, which is an `ISourceLogRecord`) for RMW copy updates. | ||
|
|
||
| Previously `PendingContext` had separate `HeapContainers` for keys and values. However, for operations such as conditional insert for Copy-To-Tail or Compaction, we need to carry through the entire log record (including optionals). In the case of records read from disk (e.g. Compaction), it is easiest to pass the `LogRecord` in its entirety, including its `SectorAlignedMemory` buffer, in the `DiskLogRecord`. So now PendingContext will also serialize the Key passed to Upsert or RMW, and the value passed to Upsert, as a `DiskLogRecord`. `PendingContext` still carries the `HeapContainer` for Input, and `CompletedOutputs` must still retain the Key's `HeapContainer`. | ||
| Previously `PendingState` had separate `HeapContainers` for keys and values. However, for operations such as conditional insert for Copy-To-Tail or Compaction, we need to carry through the entire log record (including optionals). In the case of records read from disk (e.g. Compaction), it is easiest to pass the `LogRecord` in its entirety, including its `SectorAlignedMemory` buffer, in the `DiskLogRecord`. So now PendingState will also serialize the Key passed to Upsert or RMW, and the value passed to Upsert, as a `DiskLogRecord`. `PendingState` still carries the `HeapContainer` for Input, and `CompletedOutputs` must still retain the Key's `HeapContainer`. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request introduces several improvements to in-place update logic, overflow detection, and memory management in the storage engine. The most significant changes are improved overflow handling in arithmetic operations, more precise management of record size information, and safer transfer of pending operation data. These changes enhance performance and correctness, especially in hot code paths, while also simplifying and clarifying the codebase.
Arithmetic overflow handling and in-place update logic:
checkedarithmetic and exception handling inTryInPlaceUpdateNumberwith a branchless signed overflow detection technique, eliminating expensive try/catch blocks and improving performance on the hot path. Now, overflows are detected using bitwise operations, and errors are flagged directly without exceptions. [1] [2] [3]Record size and memory management:
InPlaceUpdaterWorkerto declareRecordSizeInfolocally within each relevant switch case, removing the previously sharedsizeInfo2variable and clarifying where size information is needed. Added explicit calls toAssertOptionalsIfSetafter size changes to ensure internal consistency. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]Pending operation and output transfer:
CompletedOutputIteratorandCompletedOutputto transfer data from the pending IO slot rather than the pending context, ensuring that all relevant fields are moved and cleared properly. This change improves the safety and correctness of asynchronous IO handling. [1] [2]Scan and IO preparation logic:
ConditionalScanPushandPrepareIOForConditionalScanto pass and use the key hash explicitly, and to set scan-specific payloads on the pending operation slot, improving clarity and correctness in scan operations. [1] [2] [3]Async IO context cleanup:
basePendingContextandslotfields, ensuring no stale references remain in pooled contexts.