Skip to content

Feat/mlx binding#52

Draft
kerthcet wants to merge 30 commits into
InftyAI:mainfrom
kerthcet:feat/mlx-binding
Draft

Feat/mlx binding#52
kerthcet wants to merge 30 commits into
InftyAI:mainfrom
kerthcet:feat/mlx-binding

Conversation

@kerthcet

@kerthcet kerthcet commented Jul 2, 2026

Copy link
Copy Markdown
Member

What this PR does / why we need it

Which issue(s) this PR fixes

Fixes #

Special notes for your reviewer

Does this PR introduce a user-facing change?


kerthcet added 30 commits May 16, 2026 19:18
* Add HF downloader support

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add bars

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix color

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix color

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add download successfully message

Signed-off-by: kerthcet <kerthcet@gmail.com>

* change the color

Signed-off-by: kerthcet <kerthcet@gmail.com>

* change the rending shape

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
* support new cache structure

Signed-off-by: kerthcet <kerthcet@gmail.com>

* support puma rm

Signed-off-by: kerthcet <kerthcet@gmail.com>

* use readable format

Signed-off-by: kerthcet <kerthcet@gmail.com>

* remove requests.rs

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix lint

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* polish the format of the ls command

Signed-off-by: kerthcet <kerthcet@gmail.com>

* Have a progress manager

Signed-off-by: kerthcet <kerthcet@gmail.com>

* Reuse caches

Signed-off-by: kerthcet <kerthcet@gmail.com>

* rename util to utils

Signed-off-by: kerthcet <kerthcet@gmail.com>

* polish the layout of the download progress

Signed-off-by: kerthcet <kerthcet@gmail.com>

* revert change

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add make format

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* add speed at the end

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix lint

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* support GPU detect

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix lint

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* add support for inspect

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add support for inspect

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add pull progress bar

Signed-off-by: kerthcet <kerthcet@gmail.com>

* polish the download progress

Signed-off-by: kerthcet <kerthcet@gmail.com>

* reorganize the structure

Signed-off-by: kerthcet <kerthcet@gmail.com>

* optimize the structure

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix test

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix lint

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
* Support HF downloading models (InftyAI#16)

* Add HF downloader support

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add bars

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix color

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix color

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add download successfully message

Signed-off-by: kerthcet <kerthcet@gmail.com>

* change the color

Signed-off-by: kerthcet <kerthcet@gmail.com>

* change the rending shape

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>

* Support `puma rm <model>` (InftyAI#17)

* support new cache structure

Signed-off-by: kerthcet <kerthcet@gmail.com>

* support puma rm

Signed-off-by: kerthcet <kerthcet@gmail.com>

* use readable format

Signed-off-by: kerthcet <kerthcet@gmail.com>

* remove requests.rs

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix lint

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>

* support puma info (InftyAI#18)

Signed-off-by: kerthcet <kerthcet@gmail.com>

* Reuse the model cache to avoid duplicate download (InftyAI#19)

* polish the format of the ls command

Signed-off-by: kerthcet <kerthcet@gmail.com>

* Have a progress manager

Signed-off-by: kerthcet <kerthcet@gmail.com>

* Reuse caches

Signed-off-by: kerthcet <kerthcet@gmail.com>

* rename util to utils

Signed-off-by: kerthcet <kerthcet@gmail.com>

* polish the layout of the download progress

Signed-off-by: kerthcet <kerthcet@gmail.com>

* revert change

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add make format

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>

* remove available mem (InftyAI#22)

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add speed at the end (InftyAI#23)

* add speed at the end

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix lint

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix: do no register model once cached (InftyAI#26)

Signed-off-by: kerthcet <kerthcet@gmail.com>

* Support GPU detect (InftyAI#27)

* support GPU detect

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix lint

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>

* update readme.md (InftyAI#28)

Signed-off-by: kerthcet <kerthcet@gmail.com>

* Support inspect command (InftyAI#29)

* add support for inspect

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add support for inspect

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add pull progress bar

Signed-off-by: kerthcet <kerthcet@gmail.com>

* polish the download progress

Signed-off-by: kerthcet <kerthcet@gmail.com>

* reorganize the structure

Signed-off-by: kerthcet <kerthcet@gmail.com>

* optimize the structure

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix test

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix lint

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add metadata

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
* Optimize the commands

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix lint

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
* support sqllite

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix tests

Signed-off-by: kerthcet <kerthcet@gmail.com>

* address comments

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
* support label filtering

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add tests

Signed-off-by: kerthcet <kerthcet@gmail.com>

* better organize the structure

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix task type error

Signed-off-by: kerthcet <kerthcet@gmail.com>

* support all

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix lint

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
* optimize readme.md

Signed-off-by: kerthcet <kerthcet@gmail.com>

* update readme.md

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* update the logo

Signed-off-by: kerthcet <kerthcet@gmail.com>

* update with new logo

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* add server support

Signed-off-by: kerthcet <kerthcet@gmail.com>

* Add logo to serve

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix tests

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix lint

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix tests

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix tests

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix test

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix lint

Signed-off-by: kerthcet <kerthcet@gmail.com>

* remove libs

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix tests

Signed-off-by: kerthcet <kerthcet@gmail.com>

* change log lib

Signed-off-by: kerthcet <kerthcet@gmail.com>

* change the request log level

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
* optimize the modelInfo

Signed-off-by: kerthcet <kerthcet@gmail.com>

* optimize the layout

Signed-off-by: kerthcet <kerthcet@gmail.com>

* add alias to providers

Signed-off-by: kerthcet <kerthcet@gmail.com>

* polish comments

Signed-off-by: kerthcet <kerthcet@gmail.com>

* polish comments

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* Add model to serve command

Signed-off-by: kerthcet <kerthcet@gmail.com>

* fix test

Signed-off-by: kerthcet <kerthcet@gmail.com>

---------

Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Copilot AI review requested due to automatic review settings July 2, 2026 06:55
@InftyAI-Agent InftyAI-Agent added needs-triage Indicates an issue or PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jul 2, 2026
@kerthcet kerthcet marked this pull request as draft July 2, 2026 06:55

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR significantly expands PUMA from a basic CLI into a more complete local-model runtime by adding: (1) a SQLite-backed model registry + caching utilities, (2) Hugging Face download support with progress, (3) an OpenAI-compatible HTTP API server, and (4) an (early/placeholder) MLX backend gated to macOS + mlx feature.

Changes:

  • Introduces a new model registry/storage layer (SQLite) and CLI commands (pull, ls, inspect, rm, serve, info) that operate on it.
  • Adds an Axum-based OpenAI-compatible API server (chat + legacy completions + models + health) with streaming support for chat.
  • Adds MLX backend scaffolding (feature-gated) plus documentation/examples, along with CI/lint/test targets and refreshed docs.

Reviewed changes

Copilot reviewed 48 out of 52 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
tests/cli_test.rs Adds CLI integration tests that execute the built binary end-to-end.
src/utils/mod.rs Introduces utils module root.
src/utils/format.rs Adds formatting helpers for sizes, parameters, and relative timestamps (with unit tests).
src/utils/file.rs Adds cache/home path helpers (including PUMA_HOME override) and file utilities.
src/util/request.rs Removes legacy chunked downloader implementation.
src/util/mod.rs Removes old util module root.
src/util/file.rs Removes old home-dir helper (migrated into utils).
src/system/system_info.rs Adds system info collection (OS/arch/mem/GPU/cache stats) and display output.
src/system/mod.rs Introduces system module root.
src/storage/storage_trait.rs Adds ModelStorage trait abstraction for registry backends.
src/storage/sqlite.rs Implements SQLite-backed model storage with migrations + filtering + tests.
src/storage/mod.rs Exposes storage modules and re-exports storage types.
src/registry/model_registry.rs Adds the model registry facade (register/load/get/remove + cache cleanup) and tests.
src/registry/mod.rs Introduces registry module root.
src/main.rs Wires in new modules and switches logging to tracing_subscriber.
src/downloader/progress.rs Adds shared multi-progress management for multi-file downloads.
src/downloader/mod.rs Exposes downloader modules (huggingface + progress).
src/downloader/huggingface.rs Implements Hugging Face snapshot download + metadata fetch + registry registration + tests.
src/downloader/downloader.rs Refactors download errors and introduces Downloader trait.
src/cli/serve.rs Adds serve command to start the HTTP API with backend selection (MLX/mock).
src/cli/rm.rs Adds remove-model CLI logic backed by registry removal.
src/cli/mod.rs Exposes new CLI submodules (inspect/ls/rm/serve).
src/cli/ls.rs Adds list-models logic with regex + SQL-like filter parsing.
src/cli/inspect.rs Adds inspect-model logic and pretty-printing of model metadata.
src/cli/commands.rs Extends clap CLI: args structs, serve command, wired registry-backed commands.
src/backend/mod.rs Introduces backend module root with feature-gated MLX backend.
src/backend/mock.rs Adds mock inference engine implementation for testing/fallback.
src/backend/mlx/README.md Documents MLX backend requirements, usage, and roadmap.
src/backend/mlx/mod.rs Adds MLX backend module with exports.
src/backend/mlx/engine.rs Adds placeholder MLX inference engine implementation + tests.
src/backend/engine.rs Defines the inference engine trait + response type.
src/api/types/response.rs Adds OpenAI-compatible response types (chat + completions + models + errors).
src/api/types/request.rs Adds OpenAI-compatible request types (chat + completions).
src/api/types/mod.rs Re-exports API request/response types.
src/api/tests.rs Adds integration tests for API endpoints including streaming chat SSE.
src/api/routes.rs Creates Axum router, shared state, tracing, and permissive CORS.
src/api/models.rs Implements /v1/models and /v1/models/:model endpoints.
src/api/mod.rs Adds API module root.
src/api/completions.rs Implements legacy /v1/completions endpoint (non-streaming).
src/api/chat.rs Implements /v1/chat/completions (stream + non-stream) handler.
site/images/logo-light.svg Adds a light-theme logo asset.
README.md Major documentation refresh: features, CLI/API usage, project structure, etc.
Makefile Adds test, lint, and format targets.
hack/scripts/test_api.sh Adds manual API testing script using curl/jq.
hack/README.md Documents hack/ scripts usage.
examples/mlx_inference.rs Adds an example program for MLX inference (feature gated).
docs/MLX_INTEGRATION.md Adds detailed MLX integration guide and usage instructions.
Cargo.toml Bumps version and adds new dependencies/features for API/registry/downloader/MLX.
.github/workflows/rust-ci.yaml Adds GitHub Actions workflow for linting and tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/utils/format.rs
Comment on lines +67 to +73
if seconds < 0 {
"just now".to_string()
} else if seconds < 60 {
format!("{} seconds ago", seconds)
} else if seconds < 3600 {
let minutes = seconds / 60;
format!(
Comment thread src/utils/format.rs
assert_eq!(format_time_ago(&timestamp), "30 seconds ago");

let timestamp = (now - Duration::seconds(1)).to_rfc3339();
assert_eq!(format_time_ago(&timestamp), "1 seconds ago");
Comment thread tests/cli_test.rs
stdout.contains(text) || stderr.contains(text)
}

#[test]
Comment thread tests/cli_test.rs
assert!(stdout.contains("MODEL"));
}

#[test]
assert!(result.is_err());
}

#[tokio::test]
Comment thread src/system/system_info.rs
Comment on lines +1 to +11
use serde::{Deserialize, Serialize};
use std::fs;
use std::os::unix::fs::MetadataExt;
use std::path::PathBuf;
use std::process::Command;
use sysinfo::System;

use crate::registry::model_registry::ModelRegistry;
use crate::utils::file;
use crate::utils::format::format_size;

Comment thread src/system/system_info.rs
if let Ok(metadata) = entry.metadata() {
if metadata.is_file() {
// Use blocks * 512 to get actual disk usage (handles sparse files)
total_size += metadata.blocks() * 512;
Comment thread src/system/system_info.rs
if let Ok(metadata) = entry.metadata() {
if metadata.is_file() {
// Use blocks * 512 to get actual disk usage (handles sparse files)
total_size += metadata.blocks() * 512;
Comment thread src/backend/mlx/engine.rs
@@ -0,0 +1,185 @@
use crate::backend::engine::{GenerateResponse, InferenceEngine};
use mlx_rs::{Array, Device, Dtype};
Comment on lines +251 to +252
// Register the model only if not totally cached
if !model_totally_cached {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/needs-kind Indicates a PR lacks a label and requires one. needs-priority Indicates a PR lacks a label and requires one. needs-triage Indicates an issue or PR lacks a label and requires one.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants