Skip to content

docs(#1445): note at-most-once GC concurrency semantics#189

Open
dimitri-yatsenko wants to merge 2 commits into
mainfrom
docs/1445-gc-concurrency-note
Open

docs(#1445): note at-most-once GC concurrency semantics#189
dimitri-yatsenko wants to merge 2 commits into
mainfrom
docs/1445-gc-concurrency-note

Conversation

@dimitri-yatsenko

Copy link
Copy Markdown
Member

Summary

T3.1 of the 2.3 release plan defers the two-phase transaction-safe GC (#1445) to 2.4 (effort + design-needed). To avoid retracting stronger claims later, add a one-paragraph note to how-to/garbage-collection.md clarifying that the current GC is at-most-once: an object inserted between scan and delete may be briefly orphaned, and a future release will add quarantine-based serialization.

Test plan

  • mkdocs serve — note renders in the how-to and reads cleanly

Spec-first pair for the Renderable Protocol landing in DataJoint 2.3
(per user direction 2026-06-23, bringing T3.2 back into 2.3 scope).

New files:

- src/reference/specs/renderable.md — normative spec for the Renderable
  Protocol. Covers signature, return-value shape constraints (primitives /
  lists / dicts mapping to Spark ArrayType / StructType / MapType), why
  the contract is a Protocol rather than an abstract method on Codec,
  eligibility detection via isinstance, out-of-scope items, and two
  worked example codec implementations (FloatArrayCodec, Image2DCodec,
  PointWithLabelCodec).

- src/explanation/renderable-codecs.md — explainer. Covers the
  Bronze/Silver layer model (CDC mirror vs typed silver layer), why
  <blob@> is bronze-only, what typed renderable codecs are, the design
  rationale for the Protocol pattern (smaller OSS surface, cleaner
  opt-in, no churn for existing plugins, structural typing), what's
  out of scope, and a decision guide for choosing codecs in a new
  pipeline.

Nav entries added:
- Reference > Specifications > Type System > Renderable Codec Protocol
- Concepts > Storage > Renderable Codecs

Implementation (against this spec) follows in datajoint-python; the
addition is small (~10 lines: a runtime_checkable Protocol declaration
in src/datajoint/rendering.py, re-exported as dj.Renderable).

Examples use core DataJoint types (float64, int32) per project convention.
Cross-links to codec-api.md (the base Codec interface that renderable
codecs extend by composition, not inheritance).
T3.1 deferral holding pattern for 2.3: document current behavior so
quarantine-based serialization can land in 2.4 without retracting
stronger claims.

@MilagrosMarin MilagrosMarin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small, well-scoped note. Verified against dj.gc.collect() on master (calls scan() internally then deletes — the race window is real) and against issue #1445, which explicitly tracks the two-phase quarantine work the note commits to. Placement (right after "Run garbage collection periodically" and before "Basic Usage") is sensible — a reader gets the caveat before running any commands.

Two small wording thoughts, neither blocking:

The phrase "at-most-once" in distributed-systems parlance usually refers to message delivery guarantees; here it's used to mean "single-pass, best-effort, no retry/confirmation." "Best-effort" or "single-pass" would read more naturally, and the concluding line already uses "best-effort rather than transactionally safe" — leading with that framing might be tighter.

"An object inserted between the scan and the delete may be briefly orphaned" — the described race is real, but "orphaned" typically means "no reference exists"; here the concern is closer to "deleted despite being newly referenced" (i.e., the row references bytes that just vanished). Small terminological blur; the intent reads clearly enough.

Otherwise clean. Approving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants