Skip to content

fix(cache): scope Actor definition cache to its owner#985

Merged
MQ37 merged 7 commits into
masterfrom
fix/cache-tenant-isolation
Jun 16, 2026
Merged

fix(cache): scope Actor definition cache to its owner#985
MQ37 merged 7 commits into
masterfrom
fix/cache-tenant-isolation

Conversation

@MQ37

@MQ37 MQ37 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Mostly tests + comments — the actual logic change is small: a ~10-line ownership gate in getActorDefinitionCached, a simplified getActorMcpUrlCached, and deleting the mcpServerCache global.

What

Gate actorDefinitionCache hits by ownership: public Actors are served to any caller, private Actors only to their owner; everyone else falls through to a fresh fetch under their own token. Drop the separate mcpServerCache and derive the MCP URL from the now-gated definition cache.

Why

The cache is process-wide and was keyed by Actor name only. On a multi-tenant HTTP deployment, token B requesting an Actor that token A already cached got a cache hit that bypassed B's authorization check — leaking a private Actor's definition (input schema, README, run options) across tenants. getActorMCPServerURL does no I/O, so mcpServerCache guarded nothing expensive and re-introduced the same leak for the MCP URL; removing it closes that vector and deletes a global. Owner check reuses the already-cached getUserInfoCached, so no extra API call on the hot path. Found in an internal security review of the public package.

The gate hinges on two invariants (documented in callerMaySeeCachedActor): info.userId is the platform-set owner, and the caller's identity is resolved with the same user('me') identity the platform authorizes with. Verified against the live Apify API: a private Actor returns isPublic: false and userId equal to its owner's user('me').id; a public Actor returns isPublic: true. So a non-owner token resolves to a different user('me').id, the predicate denies, and the cache can never grant more than a bare re-fetch would.

Testing

  • New tests/unit/utils.actor.cache.test.ts (7 cases) pinning the authorization truth table: public→served to all (asserts no user('me') lookup), private→owner served, private→non-owner returns the re-fetched object (never the cached one), private→anonymous denied, and the getActorMcpUrlCached gate (owner→URL, non-owner private→false/re-fetch, missing→false no throw).
  • pnpm type-check, pnpm lint, pnpm format:check, pnpm test:unit (870 pass / 2 skipped) all green.

Residual risk (accepted): an Actor cached while public, then flipped private by its owner, is still served from the public fast-path until the 30-min TTL expires. The data was world-readable at cache time and the window is bounded; closing it would require per-request visibility re-checks that defeat the cache.

Note: a private Actor legitimately shared with another user re-fetches each call (cache miss) rather than serving from cache — safe, slightly less efficient for that uncommon case.

MQ37 added 3 commits June 11, 2026 16:31
The process-wide actorDefinitionCache was keyed by Actor name only, so on a
multi-tenant HTTP server one token could read another tenant's private Actor
definition (input schema, README, run options) straight from cache with no
authorization check. Gate cache hits: public Actors are served to everyone,
private Actors only to the owner (effective userId == owner userId, via the
already-cached getUserInfoCached); everyone else re-fetches under their own
token, which the platform authorizes.

Drop the separate mcpServerCache and derive the MCP URL from the now-gated
definition cache instead — getActorMCPServerURL does no I/O, so the extra
cache guarded nothing and re-introduced the same cross-tenant leak.
Council review nits on the cache-isolation gate:
- Document the two load-bearing invariants (info.userId is the platform-set
  owner; caller identity must match the platform's auth identity, null stays
  the sole sentinel) so a future refactor can't silently invert the gate.
- Add TTLLRUCache.clear() and reset the shared cache in beforeEach so tests
  don't pass by virtue of unique names.
- Cover getActorMcpUrlCached (was untested): owner-visible -> URL, non-owner
  private -> re-fetch + false (no leak of the cached URL), missing -> false
  no throw, non-MCP -> false. Strengthen the non-owner definition test to
  assert the re-fetched object is returned, never the cached one, plus
  fail-closed-on-identity-error and no-poisoning-after-shared-access cases.
- Delete the now-dead MCP_SERVER_CACHE_* constants left in const.ts.
Council review on test minimalism — cut the cases that duplicate another or
only assert the mock, keeping the complete authorization truth table:
- drop the 'fail-closed on identity error' case (byte-identical to the
  anonymous case; the mock can't throw),
- drop the 'no corruption after shared re-fetch' case (no product path writes
  info.userId, so it only re-asserts the mock and duplicates the non-owner case),
- drop the 'non-MCP -> false' case (false-branch already covered by the missing
  -> false case and mcp.actors.test.ts).
Also remove the test-only TTLLRUCache.clear() (every case uses a unique key)
and tighten the gate comments to the two load-bearing invariants.
@github-actions github-actions Bot added t-ai Issues owned by the AI team. tested Temporary label used only programatically for some analytics. labels Jun 12, 2026
MQ37 added 2 commits June 12, 2026 16:58
master's CachedUserInfo gained a required isOrganization field; update the
mocked return values to match after merging master.

@jirispilka jirispilka left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR.

As Claude puts it: This is a textbook "delete the layer" move, not a "polish the layer" move

Comment thread src/utils/actor.ts
MQ37 and others added 2 commits June 16, 2026 12:08
Co-authored-by: Jiří Spilka <jiri.spilka@apify.com>
…olation

Resolve src/const.ts conflict: master (#1000) removed the unused cache
constants; this branch defines them locally in state.ts/userid_cache.ts,
so take master's deletion. Strip trailing whitespace in actor.ts comment
(oxfmt).
@MQ37 MQ37 merged commit 2dc6c21 into master Jun 16, 2026
15 checks passed
@MQ37 MQ37 deleted the fix/cache-tenant-isolation branch June 16, 2026 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

t-ai Issues owned by the AI team. tested Temporary label used only programatically for some analytics.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants