Add security threat model (THREAT_MODEL.md) + SECURITY.md/AGENTS.md discoverability by potiuk · Pull Request #6535 · apache/hive

potiuk · 2026-06-11T15:54:55Z

This adds a v0 security threat model + discoverability wiring to apache/hive, produced by the ASF Security team for the Hive PMC to review and own — the pre-flight step for the Glasswing security scan the PMC opted into.

What's here

THREAT_MODEL.md — a v0 model (Michael Scovetta rubric, run with Claude Opus) covering the HiveServer2 SQL front door, the Metastore, and the UDF / SerDe / execution layer: trust boundaries, in/out-of-scope adversaries, what Hive upholds vs. what it leaves to the operator (TLS, authorization-model choice, network isolation, UDF vetting), known non-findings, and triage dispositions. Every non-trivial claim is provenance-tagged (documented) / (maintainer) / (inferred); the (inferred) ones are our hypotheses.
SECURITY.md — private reporting via security@hive.apache.org + a pointer to the model.
AGENTS.md — wires AGENTS.md → SECURITY.md → THREAT_MODEL.md so the scan agent (and researchers) can mechanically find the model.

How to engage — this is a draft to react to, not a finished artifact. THREAT_MODEL.md §14 collects open questions in waves; answer inline a few at a time, correct anything wrong, and the model becomes the PMC's. Once you're happy, we queue the scan in OSS-criticality order. No deadline pressure with the Mythos 5 window being extended.

Generated-by: Claude Opus 4.8 (1M context)

… discoverability v0 threat model produced by the ASF Security team via threat-model-producer (Michael Scovetta rubric, run with Claude Opus) for the PMC to review, correct, and own. Wires the AGENTS.md -> SECURITY.md -> THREAT_MODEL.md discoverability chain the scan agent follows. Every non-trivial claim is provenance-tagged; open questions for the PMC are collected in THREAT_MODEL.md section 14. Generated-by: Claude Opus 4.8 (1M context)

okumin · 2026-06-12T09:19:18Z

Thank you! I will check the draft

potiuk · 2026-06-14T01:40:31Z

Thanks @okumin — no rush. The most useful read is the §14 "Open questions" section at the end; those are where I inferred a position and would value your confirmation or correction.

okumin

I answered some obvious points. I'm still checking the remaining

Incorporates okumin's PR apache#6535 review: - direct Hive Metastore access in scope (HMS enforces caller authz at the application level; Spark et al. connect directly) -> §3.3/§4/§11a - UDF/SerDe/TRANSFORM code-execution detail: built-in UDF blacklist (reflect/reflect2/java_method/in_file), custom UDF/SerDe admin trust, TRANSFORM disable via DisallowTransformHook -> §7/§8/§11a - §14 Q1/Q2 promoted to maintainer; Q7/Q9/Q12 annotated PMC-reviewing Generated-by: Claude Opus 4.8

potiuk · 2026-06-17T01:36:03Z

Thanks okumin — this is exactly the kind of detail that makes the model useful. Folded your review in and pushed (THREAT_MODEL.md, +75/-23):

Direct Metastore access (your L186): added as in-scope adversary §3.3 — HMS enforces caller authorization at the application level (since Spark and similar talk to it directly), and §4 now frames network isolation as defense-in-depth rather than the primary control. Correspondingly flipped the §11a "Metastore Thrift port has no authorization" entry from out-of-scope to VALID/in-model.
UDF / SerDe / TRANSFORM (your L190): folded the whole breakdown into §7, with the config levers in §8 — built-in code-exec UDFs (reflect, reflect2, java_method, in_file) blocked via hive.server2.builtin.udf.blacklist; custom UDF/SerDe/InputFormat/OutputFormat as admin-trusted jar installs; TRANSFORM disabled via DisallowTransformHook in hive.exec.pre.hooks. Added a §11a non-finding for the built-in-UDF case. (Your gist was very helpful — thanks for the link.)

I've left these as "PMC reviewing" in §14 pending your follow-up, so nothing's prematurely locked:

doAs (L203): I noted hive.server2.enable.doAs=false as the expected posture but flagged it for the second-pair-of-eyes check you asked for.
Ranger-only authz (L209): left §6/§9 open while you weigh whether to treat Ranger as the only supported authorization system.
TLS params (L215): §8 TLS lever left unnamed until you confirm the exact Hive-side parameter names.

On your off-topic question — separate THREAT_MODEL.md for the Metastore: my lean is to keep a single THREAT_MODEL.md but with clearly separated HS2 vs HMS scope/boundary/property subsections, since it's one repo and one discoverability chain (AGENTS.md -> SECURITY.md -> THREAT_MODEL.md) and the triage dispositions are shared. If you'd rather split them, that works too — we'd just point SECURITY.md at both files. Your call; happy to restructure either way.

…to separate threat models (different security models/params)

potiuk · 2026-06-21T19:45:21Z

Thanks @okumin — really useful detail. Your answers are folded into THREAT_MODEL.md:

Direct Metastore access (Q1) — now in-scope: §3.3 + §4 state HMS enforces caller authorization at the application level (since Spark et al. talk to it directly), and §11a flips "Metastore Thrift port has no authz" to VALID rather than out-of-scope.
UDF / SerDe / TRANSFORM (Q2) — captured your full breakdown in §7/§8/§11a: the insecure built-ins (reflect, reflect2, java_method, in_file) blocked via hive.server2.builtin.udf.blacklist (Ranger configures it); custom UDFs/SerDes/InputFormats as admin-trusted code; TRANSFORM prohibited via DisallowTransformHook in hive.exec.pre.hooks.
doAs (Q7) — recorded your expectation that hive.server2.enable.doAs=false is the intended posture (HS2 enforces policy itself), flagged as pending a second PMC member's double-check before we finalize §4/§8.
Metastore protection / Ranger-only (Q9) — app-level (not network-level) folded in; "accept only Ranger as the authz system?" left open since you're still considering it.
TLS params (Q12) — left the §8 TLS lever unnamed pending the exact Hive-side parameter names you're checking.

On your off-topic question — splitting HS2 and HMS into separate threat models: I think that's a good idea given they have genuinely different security models and parameters, and it also reads cleaner for the scan agent. I've noted it as §14 Q15; happy to split this into THREAT_MODEL.md (HS2) + a dedicated HMS model on your nod.

No rush on the rest of your review — ping me when you've worked through the remaining points and I'll fold them in one pass.

okumin · 2026-06-27T09:46:51Z

+   isolation, UDF vetting). Anything mis-assigned?
+10. Confirm the by-design non-guarantees in §7.
+11. Is super-linear resource use / a hang on a pathological query a bug, or is
+    bounding it the operator's job (YARN queues / HS2 limits)?


In general, this is an operator responsibility rather than a Hive bug. Hive accepts arbitrary HiveQL, so operators are expected to use HiveServer2 limits (e.g., hive.query.max.length) and YARN resource pools to bound the impact of pathological queries.
If a stronger isolation than HS2 and YARN can provide is required, operators should use separate HS2 instances or separate Hadoop/YARN clusters.

okumin · 2026-06-27T10:04:27Z

+   code-execution-by-design (not a sandbox), per §7?~~
+3. Confirm the assumed deployment: clustered, behind an operator-controlled
+   perimeter, with Hadoop + a metastore RDBMS + (Ranger or SQL-std auth) + KDC
+   as trusted dependencies.


Yes. We assume a clustered Hive deployment behind an operator-controlled perimeter. Hadoop, the metastore RDBMS, the configured authorization provider, and the Kerberos KDC are trusted dependencies.
The authorization provider is typically Ranger in production deployments, though SQL-standard authorization may also be used.

okumin · 2026-06-27T10:06:24Z

+4. Is the in-scope adversary "a SQL client at the HS2 boundary" (+ a network
+   MITM where TLS is off)? Anything to add?
+5. Confirm operators with storage/metastore-DB/cluster-process access, and
+   trusted admins doing authorized actions, are out of model.


Yes. Operators with direct storage, metastore DB, or cluster-process access are considered trusted and out of scope. Authorized actions by trusted administrators are also out of model. This threat model focuses on behavior via Hive’s supported interfaces, assuming that the underlying infrastructure and its administrators are trusted.

okumin · 2026-06-27T14:56:04Z

+   perimeter, with Hadoop + a metastore RDBMS + (Ranger or SQL-std auth) + KDC
+   as trusted dependencies.
+4. Is the in-scope adversary "a SQL client at the HS2 boundary" (+ a network
+   MITM where TLS is off)? Anything to add?


Yes. The primary in-scope adversaries are untrusted clients at Hive service boundaries: SQL clients submitting statements to HS2, and clients accessing the Hive Metastore through supported APIs.
A network MITM is also in scope when TLS or equivalent transport protection is enabled, and a Hive operator is responsible for setting up TLS properly.

potiuk · 2026-06-27T21:27:27Z

Thanks @okumin — this is exactly the maintainer input the §14 open questions were fishing for, and it sharpens the model a lot. How I'll fold it in:

Answered questions → maintainer-ratified. Your adversary model (untrusted SQL/metastore clients; MITM when TLS is enabled), trusted dependencies (Hadoop, metastore RDBMS, the authz provider, KDC), and trusted-admin-out-of-scope will move from (inferred) to (maintainer) in the next push.

UDF / SerDe / TRANSFORM. The built-in-UDF blacklist (reflect, reflect2, java_method, in_file via hive.server2.builtin.udf.blacklist), the custom-UDF / SerDe / InputFormat trust model, and the TRANSFORM prohibition via DisallowTransformHook are exactly the detail the "properties / downstream responsibilities" sections needed — I'll write them in close to as you stated them.

Metastore direct access (line 186). Agreed it belongs in scope — I'll add direct Hive Metastore access (e.g. from Spark) as an in-scope interface and cite your gist.

On a separate THREAT_MODEL.md for the Metastore: my suggestion is to keep one file but split it into clearly-labelled HiveServer2 and Hive Metastore sections, each with its own scope / adversary / trust-boundary subsection — rather than two files. Automated scanners discover the model by following AGENTS.md → SECURITY.md → THREAT_MODEL.md per repo; since HS2 and the Metastore live in this one repo, a single well-sectioned file keeps that chain intact while still giving each component a distinct model. Happy to split into two files instead if the PMC prefers — your call.

Still open (no rush), left as open questions pending your word:

hive.server2.enable.doAs=false under auth — you wanted a second pair of eyes; flagging for other reviewers here.
Whether to treat Ranger as the only authorization system, or keep SQL-standard authz in the model too.
The TLS parameters you're still checking.

I'll push the fold-in and re-request your review. Thanks again.

okumin · 2026-06-28T03:44:21Z

Thanks.

On a separate THREAT_MODEL.md for the Metastore: my suggestion is to keep one file but split it into clearly-labelled HiveServer2 and Hive Metastore sections, each with its own scope / adversary / trust-boundary subsection — rather than two files

Sure. We would like to follow the best practice. Let's go with your suggestion now.

I'm still checking the unanswered questions. Wait a while

§3/§4/§7 adversaries, trusted deps, query-resource bounding; §14 Q3/Q4/Q5/Q11 marked answered. Generated-by: Claude Code (Claude Opus 4.7)

potiuk · 2026-06-28T13:03:34Z

@okumin — pushed the fold-in (c8ca131): your 2026-06-27 answers on the adversary model, trusted dependencies, out-of-scope operators/admins, and query-resource bounding are now in the model (§3/§4/§7), and §14 Q3/Q4/Q5/Q11 are marked answered. Still open whenever you have a moment: the hive.server2.enable.doAs=false double-check (you wanted a second pair of eyes), whether to treat Ranger as the only authorization system, the TLS parameter names, and the HS2-vs-Metastore split (Q15). No rush — thanks for the thorough review.

sonarqubecloud · 2026-06-28T14:06:28Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

asf-ci-hive added tests pending tests unstable and removed tests pending labels Jun 11, 2026

okumin reviewed Jun 15, 2026

View reviewed changes

Comment thread THREAT_MODEL.md Outdated

Comment thread THREAT_MODEL.md Outdated

Comment thread THREAT_MODEL.md Outdated

Comment thread THREAT_MODEL.md Outdated

Comment thread THREAT_MODEL.md

asf-ci-hive added tests pending tests passed and removed tests unstable tests pending labels Jun 17, 2026

Add §14 Q15 — okumin's question on splitting HS2 vs Hive Metastore in…

db30eb6

…to separate threat models (different security models/params)

asf-ci-hive added tests pending and removed tests passed labels Jun 21, 2026

asf-ci-hive added tests unstable and removed tests pending labels Jun 21, 2026

okumin reviewed Jun 27, 2026

View reviewed changes

THREAT_MODEL.md: fold in okumin's 2026-06-27 review answers

c8ca131

§3/§4/§7 adversaries, trusted deps, query-resource bounding; §14 Q3/Q4/Q5/Q11 marked answered. Generated-by: Claude Code (Claude Opus 4.7)

asf-ci-hive added tests pending and removed tests unstable labels Jun 28, 2026

asf-ci-hive added tests unstable and removed tests pending labels Jun 28, 2026

Uh oh!

Conversation

potiuk commented Jun 11, 2026

Uh oh!

okumin commented Jun 12, 2026

Uh oh!

potiuk commented Jun 14, 2026

Uh oh!

okumin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

potiuk commented Jun 17, 2026

Uh oh!

potiuk commented Jun 21, 2026

Uh oh!

okumin Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

okumin Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

okumin Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

okumin Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

potiuk commented Jun 27, 2026

Uh oh!

okumin commented Jun 28, 2026

Uh oh!

potiuk commented Jun 28, 2026

Uh oh!

sonarqubecloud Bot commented Jun 28, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants