feat(cli): add GPU count requests by elezar · Pull Request #1812 · NVIDIA/OpenShell

elezar · 2026-06-08T13:10:23Z

Summary

Adds structured GPU resource requirements for sandbox creation and updates the
CLI/API/runtime path so openshell sandbox create --gpu [COUNT] records GPU
intent in ResourceRequirements.gpu.

This is an intentional alpha API break: SandboxSpec.gpu and
DriverSandboxSpec.gpu are replaced by resource_requirements.gpu. The
protobuf field number is reused intentionally for that replacement, changing
field 9 from a bool to a message in both public and driver APIs. Existing live
or persisted legacy GPU intent is not migrated; callers should use a matching
OpenShell CLI/API version and recreate GPU sandboxes when they need the new
typed shape. RFC 0004 is updated to document that decision.

Related Issue

Part of #1444. Related to #1338, #1156, #1360, and #1492. Follow-up GPU support
preflight semantics are tracked in #1807.

Changes

Add ResourceRequirements.gpu.count to the public and compute-driver protos.
A present GPU requirement with omitted count means one GPU; count = 0 is
rejected.
Replace the older GPU CLI shape with --gpu for one GPU and --gpu COUNT
for counted requests.
Pass GPU resource requirements through sandbox create, gateway-to-driver
translation, provisioning timeout messages, and driver helper APIs.
Render Kubernetes nvidia.com/gpu limits from GPU requirements.
Keep exact device selection driver-owned through driver_config: Docker and
Podman use cdi_devices, and VM uses gpu_device_ids.
Validate exact device requests consistently: device IDs are opaque, duplicate
IDs are rejected, a single exact device works with default --gpu, and
multi-device exact lists require --gpu COUNT matching the list length.
Add Docker and Podman default CDI selection for counted GPU requests. The
selector refreshes CDI inventory before validation/create, picks from the
normalized NVIDIA CDI inventory in round-robin order, fails when count exceeds
selectable devices, and treats WSL2 nvidia.com/gpu=all fallback as one
selectable device.
Keep VM GPU support limited to one GPU and reject VM counts above one.
Update GPU request docs, RFC 0004, architecture notes, and Docker/Podman/
Kubernetes driver READMEs.

Testing

mise run pre-commit
/Users/elezar/.local/bin/mise exec -- cargo check -p openshell-core -p openshell-driver-docker -p openshell-driver-podman -p openshell-driver-vm -p openshell-driver-kubernetes
/Users/elezar/.local/bin/mise exec -- cargo test -p openshell-core -p openshell-driver-docker -p openshell-driver-podman -p openshell-driver-vm -p openshell-driver-kubernetes gpu --lib
/Users/elezar/.local/bin/mise exec -- cargo clippy -p openshell-core -p openshell-driver-docker -p openshell-driver-podman -p openshell-driver-vm -p openshell-driver-kubernetes --all-targets -- -D warnings

CI is running for the rebased head. The PR has the test:e2e-gpu label so the
required Docker GPU E2E gate runs in CI.

Checklist

Follows Conventional Commits
Commits are signed off (DCO)
Unit/integration tests updated
Architecture and user-facing docs updated

github-actions · 2026-06-08T13:10:56Z

🌿 Preview your docs: https://nvidia-preview-pr-1812.docs.buildwithfern.com/openshell

mrunalp · 2026-06-08T20:50:37Z

/ok to test abe5b79

TaylorMutch · 2026-06-08T21:21:38Z

/ok to test abe5b79

copy-pr-bot · 2026-06-10T07:55:15Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

elezar · 2026-06-10T07:55:38Z

Landing #1815 first should simplify the changes here.

github-actions · 2026-06-12T07:46:59Z

Label test:e2e-gpu applied for 06c69dd. Open the existing run and click Re-run all jobs to execute with the label set. The run will execute GPU E2E after building the required supervisor image once. The matching required CI gate status on this PR will flip green automatically once the run finishes.

elezar · 2026-06-12T07:47:06Z

PR Review Status

Validation: this is maintainer-authored, project-valid GPU CLI/API/runtime work that aligns GPU sandbox intent with structured resource requirements and the related resource-requirements RFC direction.
Head SHA: 06c69dddf62dd74b3215bdc8e3dafc95ee2622a2

Review findings:

Blocking: crates/openshell-cli/src/run.rs moves resource_requirements into CreateSandboxRequest and then borrows it later for provisioning timeout messages. This is a Rust use-after-move compile failure; clone or otherwise retain the value before moving it into the request.
Blocking: proto/openshell.proto and proto/compute_driver.proto reuse field number 9 for resource_requirements, replacing the old bool gpu = 9 with a message. Old clients and persisted sandbox records encode field 9 as a varint, while the new schema expects length-delimited data, so prost can silently ignore the old GPU request. Reserve the old field and add resource_requirements on a new field, or keep a transitional deprecated field and map both safely.
Warning: --gpu now accepts an optional COUNT, so default GPU plus a trailing command requires the documented --gpu -- <command> form. The docs/tests cover that; keep the parse error crisp for accidental --gpu <command> usage.

Docs: Fern docs were updated under docs/; no navigation change appears necessary for this focused CLI/runtime behavior update.
E2E: test:e2e-gpu is applied because this changes GPU runtime behavior and user-visible GPU sandbox creation semantics.
Checks: OpenShell / Branch Checks is still pending for this head; OpenShell / Helm Lint and DCO are passing.

Next state: gator:in-review

elezar · 2026-06-12T08:27:00Z

Re-check After CI Update

I re-evaluated latest head 06c69dddf62dd74b3215bdc8e3dafc95ee2622a2 after OpenShell / Branch Checks completed successfully.

Disposition: partially resolved.

Remaining items:

Resolved: the earlier compile-failure concern in crates/openshell-cli/src/run.rs is cleared by the passing Rust checks and the current diff retaining resource_requirements for the provisioning timeout path.
Still blocking: proto/openshell.proto and proto/compute_driver.proto still replace the old bool gpu = 9 fields with length-delimited ResourceRequirements resource_requirements = 9. That is a protobuf wire-compatibility break for existing clients and stored sandbox records. Please reserve the old field number and add resource_requirements on a new field, or keep a deprecated transitional field and map both safely.
OpenShell / GPU E2E is still pending for this head with test:e2e-gpu applied.

Next state: gator:in-review

elezar · 2026-06-12T09:42:49Z

Re-check After CI Update

I re-evaluated latest head 06c69dddf62dd74b3215bdc8e3dafc95ee2622a2 after the required gates completed successfully.

Disposition: partially resolved.

Remaining items:

Resolved: OpenShell / Branch Checks, OpenShell / Helm Lint, DCO, and OpenShell / GPU E2E are now green for this head with test:e2e-gpu applied.
Still blocking: proto/openshell.proto and proto/compute_driver.proto still replace the old bool gpu = 9 fields with length-delimited ResourceRequirements resource_requirements = 9. That is a protobuf wire-compatibility break for existing clients and stored sandbox records. Please reserve the old field number and add resource_requirements on a new field, or keep a deprecated transitional field and map both safely.

Next state: gator:in-review

elezar · 2026-06-12T10:37:40Z

Re-check After Author Update

I re-evaluated latest head 87c9a8c6015bc23d48b11823540b44d85fa24399 after the branch update following the prior protobuf compatibility feedback.

Disposition: not resolved.

Remaining items:

Still blocking: proto/openshell.proto and proto/compute_driver.proto still define resource_requirements = 9, replacing the previous bool gpu = 9 field with a length-delimited message. This remains a protobuf wire-compatibility break for old clients and stored sandbox records. Please keep a deprecated transitional gpu = 9 field and map it when the new field is absent, or reserve field 9 and add resource_requirements on a new field number.
Still blocking: crates/openshell-cli/src/run.rs still moves resource_requirements into CreateSandboxRequest.spec.resource_requirements and later reads resource_requirements.as_ref() for provisioning timeout messages. Please clone or otherwise retain that value before moving it into the request.
Additional warning from the independent re-review: Docker and Podman should also reject count-only GPU requests at the create/build sink, not only during validation, so a missed validation path cannot over-grant all CDI GPUs.
OpenShell / Branch Checks, OpenShell / Helm Lint, and DCO are passing for this head. OpenShell / GPU E2E is still pending with test:e2e-gpu applied.

Next state: gator:in-review

elezar · 2026-06-12T10:55:40Z

Re-check After CI Update

I re-evaluated latest head 87c9a8c6015bc23d48b11823540b44d85fa24399 after the required gates completed successfully and after a focused re-check of the current diff.

Disposition: partially resolved.

Remaining items:

Resolved: the prior crates/openshell-cli/src/run.rs move/borrow compile concern is no longer present in the current diff, and OpenShell / Branch Checks is green for this head.
Resolved: the prior Docker/Podman count-only GPU warning is addressed at the driver validation/build paths. Docker and Podman now reject count-only GPU requests without driver_config.cdi_devices, and they validate exact CDI device requests at the sink.
Still blocking: proto/openshell.proto and proto/compute_driver.proto still define resource_requirements = 9, replacing the previous bool gpu = 9 field with a length-delimited message. That remains a protobuf wire-compatibility break for old clients and stored sandbox records. Please keep a deprecated transitional gpu = 9 field and map it when the new field is absent, or reserve field 9 and add resource_requirements on a new field number.
Checks: OpenShell / Branch Checks, OpenShell / Helm Lint, DCO, and OpenShell / GPU E2E are green for this head with test:e2e-gpu applied.

Next state: gator:in-review

elezar · 2026-06-12T11:33:50Z

The breaking proto change is intentional. However, I will defer to @drew on whether we should rather reserve the previous gpu field and introduce this at a new number.

elezar · 2026-06-12T11:47:50Z

Re-check After Author Update

I re-evaluated latest head 87c9a8c6015bc23d48b11823540b44d85fa24399 after @elezar's 2026-06-12 11:33 UTC comment that the protobuf wire-compatibility break is intentional and should be decided by @drew.

Disposition: not resolved.

Remaining items:

The protobuf compatibility concern remains blocking for gator: proto/openshell.proto and proto/compute_driver.proto still define resource_requirements = 9, replacing the previous bool gpu = 9 field with a length-delimited message. Please either change the field numbering/transitional mapping, or have an appropriate maintainer explicitly accept that compatibility break on this PR.
Checks are green for this head with test:e2e-gpu applied.

Next state: gator:in-review

elezar · 2026-06-12T13:01:42Z

Re-check After Author Update

I re-evaluated latest head da6fbd8752372c86a834d59edb238d61d38dc9b7 after the branch update that followed the prior protobuf compatibility feedback.

Disposition: not resolved.

Remaining items:

Still blocking: crates/openshell-cli/src/run.rs constructs resource_requirements, moves it into CreateSandboxRequest.spec.resource_requirements, and later borrows resource_requirements.as_ref() for provisioning timeout messages. Please clone the value into the request or retain a separate GPU-request flag/value before moving it.
Still blocking: proto/openshell.proto and proto/compute_driver.proto still define resource_requirements = 9, replacing the previous bool gpu = 9 field with a length-delimited message. This remains a protobuf wire-compatibility break for old clients, old drivers, and persisted sandbox records. Please either change the field numbering/transitional mapping, or have an appropriate maintainer explicitly accept that compatibility break on this PR.
Resolved: the prior Docker/Podman count-only GPU warning remains addressed. Docker and Podman reject count-only GPU requests without driver_config.cdi_devices, and they validate exact CDI device requests at the sink.
Checks: OpenShell / Branch Checks and OpenShell / GPU E2E are still pending for this head; OpenShell / Helm Lint, DCO, and the docs preview are passing.

Next state: gator:in-review

elezar · 2026-06-12T13:24:58Z

Re-check After CI and Independent Review

I re-evaluated latest head da6fbd8752372c86a834d59edb238d61d38dc9b7 after the required gates completed successfully and after a bounded independent review of the current diff.

Disposition: not resolved.

Remaining items:

Still blocking: crates/openshell-cli/src/run.rs moves resource_requirements into CreateSandboxRequest.spec.resource_requirements and later borrows resource_requirements.as_ref() for provisioning timeout messages. Please clone the value into the request or retain a separate timeout value before the move.
Still blocking: proto/openshell.proto and proto/compute_driver.proto still define resource_requirements = 9, replacing the previous bool gpu = 9 field with a length-delimited message. This remains a protobuf wire-compatibility break for old clients, old drivers, and persisted sandbox records unless an appropriate maintainer explicitly accepts that break on this PR.
Warning: Docker and Podman reject count-only GPU requests during create validation, but the lower-level device builders can still translate gpu.count = Some(_) without exact devices into the all-GPUs CDI request if reached directly. Please mirror the count-only rejection at the builder sink and add builder-level tests, or explain why no alternate path can reach those builders without prior validation.
Checks are now green for this head: OpenShell / Branch Checks, OpenShell / Helm Lint, DCO, OpenShell / E2E, and OpenShell / GPU E2E are passing with test:e2e-gpu applied.

Next state: gator:in-review

elezar · 2026-06-13T09:04:55Z

Responding to the two blocking findings from gator:

crates/openshell-cli/src/run.rs is not a use-after-move compile failure. The generated prost types for ResourceRequirements and GpuResourceRequirements derive Copy, so Option<ResourceRequirements> is also Copy. I verified this by running cargo check -p openshell-cli -j 1 successfully. I also tested the proposed clone fix, and cargo clippy --workspace --all-targets -- -D warnings rejects it with clippy::clone_on_copy, confirming the original code is the correct form for the generated type.
The protobuf field-number change is an intentional alpha API break. The compute-driver proto is not currently treated as a public compatibility surface: gateway and driver lifetimes are tightly coupled, and local drivers are launched by the gateway at startup. For the public SandboxSpec, direct API use is currently limited to the matching OpenShell CLI. We are also intentionally not preserving live or persisted legacy GPU intent across this transition; GPU sandboxes should be recreated after upgrade if they need the new typed resource-requirements shape.

Given that the API is still alpha, we do not want to carry legacy GPU-specific reserved or transitional fields forward into the proto shape we intend to stabilize. I will update RFC 0004 in this branch to reflect that decision, since the current RFC text still describes reserving the old GPU fields.

elezar · 2026-06-16T20:25:10Z

Re-check After Author Update

I re-evaluated latest head 60a7ccd17cc6933759d5b134bf0277ae2e9f5acb after the branch updates following the prior GPU API/runtime feedback. I also re-checked the forced-update delta from b6812340 to 60a7ccd1; that delta only changes crates/openshell-driver-kubernetes/src/driver.rs and does not affect the items below.

Disposition: not resolved.

Remaining items:

Still blocking: the Python GPU E2E public API coverage was removed instead of migrated. e2e/python/test_sandbox_gpu.py is deleted and e2e/python/conftest.py no longer provides the GPU sandbox fixture, so this PR no longer proves that the Python/proto API can request a GPU sandbox with the new resource_requirements.gpu shape and execute nvidia-smi. Please restore that marked GPU E2E coverage using ResourceRequirements(gpu=GpuResourceRequirements(...)) rather than the old gpu=True field.
Still blocking: docs/reference/sandbox-compute-drivers.mdx still says Docker and Podman reject count-only GPU selection, but the current implementation resolves count-only requests through refreshed CDI inventory and round-robin selection. docs/sandboxes/manage-sandboxes.mdx documents the new behavior correctly; please update the compute-driver reference to match.
Resolved: the lower-level Docker/Podman CDI/device sinks no longer appear to translate count-only GPU requests directly into all-GPU/default injection without validation. The create/validate paths resolve concrete CDI device IDs before the container specs are built, and the builder sinks only inject provided concrete IDs.
Resolved: CLI parsing and request construction for --gpu, --gpu COUNT, zero rejection, and trailing command parsing are covered, and sandbox creation now sends SandboxSpec.resource_requirements.gpu.

Checks: the latest forced-push head is still early in CI; DCO and required status publication are pending at the time of this re-check. The review findings above must be addressed before gator can move this PR to pipeline watch.

Next state: gator:in-review

elezar · 2026-06-16T20:42:49Z

Re-check After CI Update

I re-evaluated latest head 60a7ccd17cc6933759d5b134bf0277ae2e9f5acb after OpenShell / Branch Checks completed.

Disposition: not resolved.

Remaining items:

Still blocking: OpenShell / Branch Checks failed in cargo fmt --all -- --check. The failed logs show crates/openshell-driver-kubernetes/src/driver.rs needs rustfmt at the GPU limit assertion around line 2694.
Still blocking: the Python GPU E2E public API coverage was removed instead of migrated. Please restore marked GPU E2E coverage using ResourceRequirements(gpu=GpuResourceRequirements(...)) rather than the old gpu=True field.
Still blocking: docs/reference/sandbox-compute-drivers.mdx still says Docker and Podman reject count-only GPU selection, while the current implementation resolves count-only requests through refreshed CDI inventory and round-robin selection. Please update the reference docs to match docs/sandboxes/manage-sandboxes.mdx.
OpenShell / GPU E2E is still pending for this head with test:e2e-gpu applied.

Next state: gator:in-review

elezar · 2026-06-16T21:05:30Z

Re-check After CI and Independent Review

I re-evaluated latest head 60a7ccd17cc6933759d5b134bf0277ae2e9f5acb after the required GPU E2E gate completed and after a bounded independent review of the current diff.

Disposition: not resolved.

Remaining items:

Still blocking: OpenShell / Branch Checks failed in cargo fmt --all -- --check. The failed logs show crates/openshell-driver-kubernetes/src/driver.rs needs rustfmt at the GPU limit assertion around line 2697.
Still blocking: the independent review re-raised the protobuf wire-compatibility concern. proto/openshell.proto and proto/compute_driver.proto reuse field 9, changing it from bool gpu to ResourceRequirements resource_requirements. Please either keep the legacy gpu = 9 field and put resource_requirements on a new tag with transitional mapping, or get explicit maintainer acceptance for the version-skew behavior on this PR.
Still blocking: Python GPU E2E public API coverage was removed instead of migrated. Please restore coverage using SandboxSpec(resource_requirements=...) so the Python generated-proto/API path for GPU sandbox creation remains exercised.
Still blocking: docs/reference/sandbox-compute-drivers.mdx still says Docker and Podman reject count-only GPU selection, while docs/sandboxes/manage-sandboxes.mdx and the implementation describe counted default CDI selection. Please make the reference doc match the intended behavior.
Resolved: OpenShell / GPU E2E is now green for this head with test:e2e-gpu applied. OpenShell / Helm Lint, DCO, and docs preview are also green.

Independent review summary: no additional code-level blocker was found in this bounded pass beyond the items above.

Next state: gator:in-review

elezar · 2026-06-23T10:10:12Z

Re-check After Author Update

I re-evaluated latest head 82f88d88896e43ccdff8d3023a9edffbff25efda after the author update and a bounded independent re-review of the prior blockers.

Disposition: partially resolved, but not ready to leave review.

Remaining items:

Still blocking: proto/openshell.proto and proto/compute_driver.proto still reuse field 9, changing it from legacy bool gpu to ResourceRequirements. Please reserve tag/name 9 for gpu and move resource_requirements to a new tag, or get explicit maintainer acceptance for this version-skew behavior on the PR.
Still blocking: the Python GPU E2E public API coverage was deleted instead of migrated. The Rust CLI GPU E2E replacement covers CLI/device selection, but not Python SDK/public proto construction. Please restore Python GPU E2E using the new SandboxSpec(resource_requirements=...) shape, or get explicit maintainer acceptance that this coverage is intentionally removed.
Resolved: docs/reference/sandbox-compute-drivers.mdx now describes counted GPU behavior consistently.
Resolved: Kubernetes now maps the GPU count into nvidia.com/gpu limits, and the previous rustfmt/Branch Checks blocker is resolved.

Checks: OpenShell / Branch Checks, OpenShell / Helm Lint, DCO, and the underlying GPU E2E jobs are passing for this head. The aggregate OpenShell / GPU E2E required status still appears pending, so gator cannot promote this PR even after the review blockers are addressed until that gate is green.

Next state: gator:in-review

BREAKING CHANGE: SandboxSpec.gpu and DriverSandboxSpec.gpu were replaced with resource_requirements.gpu. This intentionally reuses protobuf field 9, changing it from a bool to a message for both public and driver APIs. Signed-off-by: Evan Lezar <elezar@nvidia.com>

Signed-off-by: Evan Lezar <elezar@nvidia.com>

elezar · 2026-06-23T12:36:39Z

Re-check After Author Update

I re-evaluated latest head 3a244beff8d7fc84689b19fec8d744893b48cacb after the branch update since the 2026-06-23 10:10 UTC gator review comment, including a bounded independent re-review of the current diff.

Disposition: partially resolved, but not ready to leave review.

Remaining items:

Still blocking: proto/openshell.proto and proto/compute_driver.proto still reuse field 9, changing legacy bool gpu = 9 into ResourceRequirements resource_requirements = 9. Please reserve the legacy tag/name and move resource_requirements to a new tag, keep a deprecated transitional gpu = 9 and map it when resource_requirements is absent, or get explicit maintainer acceptance for this version-skew behavior on this PR.
Still blocking: the Python public proto/API GPU coverage is still absent. The current PR file list has no e2e/python changes, and the current branch has no Python use of ResourceRequirements or GpuResourceRequirements. Please restore marked Python GPU E2E coverage using SandboxSpec(resource_requirements=...), or get explicit maintainer acceptance that this coverage is intentionally removed.
Still pending: aggregate OpenShell / GPU E2E remains pending because gpu-e2e / E2E Docker GPU (wsl-amd64) is still pending.
Resolved: the docs now describe counted Docker/Podman CDI selection consistently, Kubernetes count mapping is present, and the previous rustfmt/Branch Checks blocker is cleared.

Independent review summary: no additional code-level blocker was found in this focused pass beyond the items above.

Next state: gator:in-review

elezar · 2026-06-23T13:28:13Z

Re-check After CI Update

I re-evaluated latest head 3a244beff8d7fc84689b19fec8d744893b48cacb after the required GPU E2E gate completed.

Disposition: partially resolved, but not ready to leave review.

Remaining items:

Resolved: OpenShell / GPU E2E is now green for this head with test:e2e-gpu applied. OpenShell / Branch Checks, OpenShell / Helm Lint, and DCO are also passing.
Still blocking: proto/openshell.proto and proto/compute_driver.proto still reuse field 9, changing legacy bool gpu = 9 into ResourceRequirements resource_requirements = 9. Please reserve the legacy tag/name and move resource_requirements to a new tag, keep a deprecated transitional gpu = 9 and map it when resource_requirements is absent, or get explicit maintainer acceptance for this version-skew behavior on this PR.
Still blocking: the Python public proto/API GPU coverage is still absent. The current PR file list has no e2e/python changes, and the current branch has no Python use of ResourceRequirements or GpuResourceRequirements. Please restore marked Python GPU E2E coverage using SandboxSpec(resource_requirements=...), or get explicit maintainer acceptance that this coverage is intentionally removed.

Next state: gator:in-review

pimlock · 2026-06-24T04:48:50Z

On the 2 items marked by the gator:

I think reusing the tag for the new data structure is okay. It keeps the code simpler, rather than requiring support for both types of requests. The only question I'd have is a behavior on cli/gateway version mismatch? From what I can tell, the request will go through, but the gpu request will be ignored. I think that's fine, it would be interesting to see usage of sandboxes with GPU, but I don't think we have telemetry attributes for that yet.
The Python SDK is being updated as part of rfc-0008: shared SDK rust core and language specific bindings #1764, so no need to manually keep it up to date IMO.

elezar · 2026-06-24T14:27:35Z

Thanks for the review @pimlock. The gator comment kept coming up even though I explicitly stated that the breaking change was acceptable at this stage.

I suppose the behaviour depends on what an on-wire bool is interpreted as when the gateway expects a typed message. According to perplexity, this behaviour is not defined in the standard and depends on the implementation, where the field will either cause an error or be skipped. Given that GPU support is still not quite where we want it to be, is not as widely used and that we're still in alpha, replacing the field seemed simpler than carrying around the old field type and performing a conversion.

On the python API. Thanks for the link.

elezar · 2026-06-25T09:02:11Z

Monitoring Complete

Monitoring is complete because this PR has merged.

Final status: the PR merged with an active gator:in-review label still present.

I removed the active gator:* label because there is nothing left for gator to monitor on this PR.

elezar requested a review from a team as a code owner June 8, 2026 13:10

elezar mentioned this pull request Jun 8, 2026

feat(gpu): move device selection to driver config #1815

Merged

5 tasks

elezar marked this pull request as draft June 10, 2026 07:55

elezar force-pushed the 1444-gpu-cli-count/elezar branch from abe5b79 to 06c69dd Compare June 12, 2026 07:12

elezar marked this pull request as ready for review June 12, 2026 07:29

elezar requested review from derekwaynecarr and mrunalp as code owners June 12, 2026 07:29

elezar added gator:in-review Gator is reviewing or awaiting PR review feedback test:e2e-gpu Requires GPU end-to-end coverage labels Jun 12, 2026

elezar force-pushed the 1444-gpu-cli-count/elezar branch from 06c69dd to 87c9a8c Compare June 12, 2026 10:26

elezar enabled auto-merge (squash) June 12, 2026 12:43

elezar disabled auto-merge June 12, 2026 12:44

elezar force-pushed the 1444-gpu-cli-count/elezar branch from 87c9a8c to da6fbd8 Compare June 12, 2026 12:55

elezar enabled auto-merge (squash) June 12, 2026 13:03

elezar force-pushed the 1444-gpu-cli-count/elezar branch from c7cf9d6 to b681234 Compare June 16, 2026 20:09

elezar requested a review from maxamillion as a code owner June 16, 2026 20:09

elezar force-pushed the 1444-gpu-cli-count/elezar branch from b681234 to 60a7ccd Compare June 16, 2026 20:24

elezar added gator:in-review Gator is reviewing or awaiting PR review feedback and removed gator:in-review Gator is reviewing or awaiting PR review feedback labels Jun 16, 2026

elezar force-pushed the 1444-gpu-cli-count/elezar branch 3 times, most recently from 35c7ef2 to 8e108bb Compare June 17, 2026 08:22

elezar mentioned this pull request Jun 17, 2026

test(e2e): remove python gpu smoke test #1948

Merged

6 tasks

elezar force-pushed the 1444-gpu-cli-count/elezar branch from 8e108bb to cad3745 Compare June 17, 2026 09:56

elezar mentioned this pull request Jun 17, 2026

feat(gpu): introduce GPU request spec #1156

Closed

3 tasks

elezar force-pushed the 1444-gpu-cli-count/elezar branch from cad3745 to 82f88d8 Compare June 17, 2026 13:19

elezar added gator:in-review Gator is reviewing or awaiting PR review feedback and removed gator:in-review Gator is reviewing or awaiting PR review feedback labels Jun 23, 2026

elezar added 2 commits June 23, 2026 14:16

feat(gpu): add GPU resource count

3a244be

Signed-off-by: Evan Lezar <elezar@nvidia.com>

elezar force-pushed the 1444-gpu-cli-count/elezar branch from 82f88d8 to 3a244be Compare June 23, 2026 12:18

pimlock approved these changes Jun 24, 2026

View reviewed changes

elezar merged commit 2c54589 into main Jun 24, 2026
39 checks passed

elezar deleted the 1444-gpu-cli-count/elezar branch June 24, 2026 04:49

elezar removed the gator:in-review Gator is reviewing or awaiting PR review feedback label Jun 25, 2026

Uh oh!

Conversation

elezar commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Testing

Checklist

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

mrunalp commented Jun 8, 2026

Uh oh!

TaylorMutch commented Jun 8, 2026

Uh oh!

copy-pr-bot Bot commented Jun 10, 2026

Uh oh!

elezar commented Jun 10, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

elezar commented Jun 12, 2026

PR Review Status

Uh oh!

elezar commented Jun 12, 2026

Re-check After CI Update

Uh oh!

elezar commented Jun 12, 2026

Re-check After CI Update

Uh oh!

elezar commented Jun 12, 2026

Re-check After Author Update

Uh oh!

elezar commented Jun 12, 2026

Re-check After CI Update

Uh oh!

elezar commented Jun 12, 2026

Uh oh!

elezar commented Jun 12, 2026

Re-check After Author Update

Uh oh!

elezar commented Jun 12, 2026

Re-check After Author Update

Uh oh!

elezar commented Jun 12, 2026

Re-check After CI and Independent Review

Uh oh!

elezar commented Jun 13, 2026

Uh oh!

elezar commented Jun 16, 2026

Re-check After Author Update

Uh oh!

elezar commented Jun 16, 2026

Re-check After CI Update

Uh oh!

elezar commented Jun 16, 2026

Re-check After CI and Independent Review

Uh oh!

elezar commented Jun 23, 2026

Re-check After Author Update

Uh oh!

elezar commented Jun 23, 2026

Re-check After Author Update

Uh oh!

elezar commented Jun 23, 2026

Re-check After CI Update

Uh oh!

pimlock commented Jun 24, 2026

Uh oh!

Uh oh!

elezar commented Jun 24, 2026

Uh oh!

elezar commented Jun 25, 2026

Monitoring Complete

Uh oh!

Reviewers

Assignees

Labels

Projects

elezar commented Jun 8, 2026 •

edited

Loading