Add GPT-OSS 20B recipes by kunal-vaishnavi · Pull Request #507 · microsoft/olive-recipes

kunal-vaishnavi · 2026-06-15T23:30:29Z

Description

This PR adds recipes for OpenAI's GPT-OSS 20B on the CPU EP, CUDA EP, and WebGPU EP.

Motivation and Context

The recipes were originally created and documented here. Recent changes to the QMoE op in ORT now allow block-wise quantization to work for all EPs.

Copilot

Pull request overview

Adds a new set of Olive recipe configurations for optimizing OpenAI GPT-OSS 20B across CPU, CUDA, and WebGPU execution providers, replacing an older one-off CUDA graph-capture script layout with a more standardized per-EP recipe structure.

Changes:

Added per-EP (cpu/cuda/webgpu) folders containing Olive ModelBuilder JSON configs for INT4 QMoE variants (including k_quant_mixed and INT8-expert options).
Added per-EP READMEs and requirements files for running the new recipes.
Removed the previous int4_cuda_int4_qmoe script-based flow and the top-level gpt-oss-20b/requirements.txt.

Reviewed changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
gpt-oss-20b/webgpu/requirements.txt	Adds Python deps for WebGPU recipe execution.
gpt-oss-20b/webgpu/README.md	Documents available WebGPU recipes and setup steps.
gpt-oss-20b/webgpu/info.yaml	Adds recipe metadata for WebGPU entrypoints.
gpt-oss-20b/webgpu/gpt-oss-20b_webgpu_int4_int4_qmoe_default.json	Default INT4/INT4 WebGPU ModelBuilder recipe.
gpt-oss-20b/webgpu/gpt-oss-20b_webgpu_int4_int4_qmoe_k_quant_mixed.json	WebGPU INT4/INT4 recipe using `k_quant_mixed`.
gpt-oss-20b/webgpu/gpt-oss-20b_webgpu_int4_int8_qmoe_default.json	WebGPU INT4 with INT8 expert weights recipe.
gpt-oss-20b/webgpu/gpt-oss-20b_webgpu_int4_int8_qmoe_k_quant_mixed.json	WebGPU INT4+INT8-expert with `k_quant_mixed` recipe.
gpt-oss-20b/requirements.txt	Removes old top-level requirements file.
gpt-oss-20b/int4_cuda_int4_qmoe/README.md	Removes legacy CUDA capture-onnx-graph documentation.
gpt-oss-20b/int4_cuda_int4_qmoe/info.yml	Removes legacy scanner metadata for the old script recipe.
gpt-oss-20b/int4_cuda_int4_qmoe/gpt-oss-20b.sh	Removes legacy CUDA capture-onnx-graph script.
gpt-oss-20b/cuda/requirements.txt	Adds Python deps for CUDA recipe execution.
gpt-oss-20b/cuda/README.md	Documents available CUDA recipes and setup steps.
gpt-oss-20b/cuda/info.yaml	Adds recipe metadata for CUDA entrypoints.
gpt-oss-20b/cuda/gpt-oss-20b_cuda_int4_int4_qmoe_k_quant_mixed.json	CUDA INT4/INT4 recipe using `k_quant_mixed`.
gpt-oss-20b/cuda/gpt-oss-20b_cuda_int4_int8_qmoe_k_quant_mixed.json	CUDA INT4+INT8-expert with `k_quant_mixed` recipe.
gpt-oss-20b/cuda/gpt-oss-20b_cuda_int4_int8_qmoe_default.json	CUDA INT4 with INT8 expert weights recipe.
gpt-oss-20b/cuda/gpt-oss-20b_cuda_int4_int4_qmoe_default.json	Default INT4/INT4 CUDA ModelBuilder recipe.
gpt-oss-20b/cpu/requirements.txt	Adds Python deps for CPU recipe execution.
gpt-oss-20b/cpu/README.md	Documents available CPU recipes and setup steps.
gpt-oss-20b/cpu/info.yaml	Adds recipe metadata for CPU entrypoints.
gpt-oss-20b/cpu/gpt-oss-20b_cpu_int4_int8_qmoe_k_quant_mixed.json	CPU INT4+INT8-expert with `k_quant_mixed` recipe.
gpt-oss-20b/cpu/gpt-oss-20b_cpu_int4_int8_qmoe_default.json	CPU INT4 with INT8 expert weights recipe.
gpt-oss-20b/cpu/gpt-oss-20b_cpu_int4_int4_qmoe_k_quant_mixed.json	CPU INT4/INT4 recipe using `k_quant_mixed`.
gpt-oss-20b/cpu/gpt-oss-20b_cpu_int4_int4_qmoe_default.json	Default INT4/INT4 CPU ModelBuilder recipe.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

tianleiwu · 2026-06-15T23:54:47Z

+            "precision": "int4",
+            "int4_op_types_to_quantize": [
+                "MatMul",
+                "Gather"
+            ]


Need test accuracy for this setting.
I think it could have accuracy problem if lm_head is quantized to 4 bits.

tianleiwu · 2026-06-15T23:55:44Z

+            "precision": "int4",
+            "int4_op_types_to_quantize": [
+                "MatMul",
+                "Gather"
+            ],


Same here. It could have accuracy problem if lm_head is quantized to 4 bits.

kunal-vaishnavi added 2 commits June 15, 2026 23:21

Add GPT-OSS 20B recipes

5345da1

Add QMoE block size

f42fec8

Copilot AI review requested due to automatic review settings June 15, 2026 23:30

Copilot started reviewing on behalf of kunal-vaishnavi June 15, 2026 23:31 View session

Fix newline issue from CI failure

bcb6085

Copilot AI reviewed Jun 15, 2026

View reviewed changes

Comment thread gpt-oss-20b/cpu/info.yaml

Comment thread gpt-oss-20b/cuda/info.yaml

Comment thread gpt-oss-20b/webgpu/info.yaml

Comment thread gpt-oss-20b/cpu/requirements.txt

Comment thread gpt-oss-20b/cuda/requirements.txt

Comment thread gpt-oss-20b/webgpu/requirements.txt

Apply suggestions from code review

a263b35

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

tianleiwu reviewed Jun 15, 2026

View reviewed changes

Merge branch 'main' into kvaishnavi/gpt-oss

db726fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPT-OSS 20B recipes#507

Add GPT-OSS 20B recipes#507
kunal-vaishnavi wants to merge 5 commits into
mainfrom
kvaishnavi/gpt-oss

kunal-vaishnavi commented Jun 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu Jun 15, 2026

Uh oh!

tianleiwu Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kunal-vaishnavi commented Jun 15, 2026

Description

Motivation and Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tianleiwu Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

tianleiwu Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants