docs(#1445): note at-most-once GC concurrency semantics#189
docs(#1445): note at-most-once GC concurrency semantics#189dimitri-yatsenko wants to merge 2 commits into
Conversation
Spec-first pair for the Renderable Protocol landing in DataJoint 2.3 (per user direction 2026-06-23, bringing T3.2 back into 2.3 scope). New files: - src/reference/specs/renderable.md — normative spec for the Renderable Protocol. Covers signature, return-value shape constraints (primitives / lists / dicts mapping to Spark ArrayType / StructType / MapType), why the contract is a Protocol rather than an abstract method on Codec, eligibility detection via isinstance, out-of-scope items, and two worked example codec implementations (FloatArrayCodec, Image2DCodec, PointWithLabelCodec). - src/explanation/renderable-codecs.md — explainer. Covers the Bronze/Silver layer model (CDC mirror vs typed silver layer), why <blob@> is bronze-only, what typed renderable codecs are, the design rationale for the Protocol pattern (smaller OSS surface, cleaner opt-in, no churn for existing plugins, structural typing), what's out of scope, and a decision guide for choosing codecs in a new pipeline. Nav entries added: - Reference > Specifications > Type System > Renderable Codec Protocol - Concepts > Storage > Renderable Codecs Implementation (against this spec) follows in datajoint-python; the addition is small (~10 lines: a runtime_checkable Protocol declaration in src/datajoint/rendering.py, re-exported as dj.Renderable). Examples use core DataJoint types (float64, int32) per project convention. Cross-links to codec-api.md (the base Codec interface that renderable codecs extend by composition, not inheritance).
T3.1 deferral holding pattern for 2.3: document current behavior so quarantine-based serialization can land in 2.4 without retracting stronger claims.
MilagrosMarin
left a comment
There was a problem hiding this comment.
Small, well-scoped note. Verified against dj.gc.collect() on master (calls scan() internally then deletes — the race window is real) and against issue #1445, which explicitly tracks the two-phase quarantine work the note commits to. Placement (right after "Run garbage collection periodically" and before "Basic Usage") is sensible — a reader gets the caveat before running any commands.
Two small wording thoughts, neither blocking:
The phrase "at-most-once" in distributed-systems parlance usually refers to message delivery guarantees; here it's used to mean "single-pass, best-effort, no retry/confirmation." "Best-effort" or "single-pass" would read more naturally, and the concluding line already uses "best-effort rather than transactionally safe" — leading with that framing might be tighter.
"An object inserted between the scan and the delete may be briefly orphaned" — the described race is real, but "orphaned" typically means "no reference exists"; here the concern is closer to "deleted despite being newly referenced" (i.e., the row references bytes that just vanished). Small terminological blur; the intent reads clearly enough.
Otherwise clean. Approving.
Summary
T3.1 of the 2.3 release plan defers the two-phase transaction-safe GC (#1445) to 2.4 (effort + design-needed). To avoid retracting stronger claims later, add a one-paragraph note to
how-to/garbage-collection.mdclarifying that the current GC is at-most-once: an object inserted between scan and delete may be briefly orphaned, and a future release will add quarantine-based serialization.Test plan
mkdocs serve— note renders in the how-to and reads cleanly