Feat/mlx binding#52
Draft
kerthcet wants to merge 30 commits into
Draft
Conversation
* Add HF downloader support Signed-off-by: kerthcet <kerthcet@gmail.com> * add bars Signed-off-by: kerthcet <kerthcet@gmail.com> * fix color Signed-off-by: kerthcet <kerthcet@gmail.com> * fix color Signed-off-by: kerthcet <kerthcet@gmail.com> * add download successfully message Signed-off-by: kerthcet <kerthcet@gmail.com> * change the color Signed-off-by: kerthcet <kerthcet@gmail.com> * change the rending shape Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
* support new cache structure Signed-off-by: kerthcet <kerthcet@gmail.com> * support puma rm Signed-off-by: kerthcet <kerthcet@gmail.com> * use readable format Signed-off-by: kerthcet <kerthcet@gmail.com> * remove requests.rs Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* polish the format of the ls command Signed-off-by: kerthcet <kerthcet@gmail.com> * Have a progress manager Signed-off-by: kerthcet <kerthcet@gmail.com> * Reuse caches Signed-off-by: kerthcet <kerthcet@gmail.com> * rename util to utils Signed-off-by: kerthcet <kerthcet@gmail.com> * polish the layout of the download progress Signed-off-by: kerthcet <kerthcet@gmail.com> * revert change Signed-off-by: kerthcet <kerthcet@gmail.com> * add make format Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* add speed at the end Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* support GPU detect Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* add support for inspect Signed-off-by: kerthcet <kerthcet@gmail.com> * add support for inspect Signed-off-by: kerthcet <kerthcet@gmail.com> * add pull progress bar Signed-off-by: kerthcet <kerthcet@gmail.com> * polish the download progress Signed-off-by: kerthcet <kerthcet@gmail.com> * reorganize the structure Signed-off-by: kerthcet <kerthcet@gmail.com> * optimize the structure Signed-off-by: kerthcet <kerthcet@gmail.com> * fix test Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
* Support HF downloading models (InftyAI#16) * Add HF downloader support Signed-off-by: kerthcet <kerthcet@gmail.com> * add bars Signed-off-by: kerthcet <kerthcet@gmail.com> * fix color Signed-off-by: kerthcet <kerthcet@gmail.com> * fix color Signed-off-by: kerthcet <kerthcet@gmail.com> * add download successfully message Signed-off-by: kerthcet <kerthcet@gmail.com> * change the color Signed-off-by: kerthcet <kerthcet@gmail.com> * change the rending shape Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com> * Support `puma rm <model>` (InftyAI#17) * support new cache structure Signed-off-by: kerthcet <kerthcet@gmail.com> * support puma rm Signed-off-by: kerthcet <kerthcet@gmail.com> * use readable format Signed-off-by: kerthcet <kerthcet@gmail.com> * remove requests.rs Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com> * support puma info (InftyAI#18) Signed-off-by: kerthcet <kerthcet@gmail.com> * Reuse the model cache to avoid duplicate download (InftyAI#19) * polish the format of the ls command Signed-off-by: kerthcet <kerthcet@gmail.com> * Have a progress manager Signed-off-by: kerthcet <kerthcet@gmail.com> * Reuse caches Signed-off-by: kerthcet <kerthcet@gmail.com> * rename util to utils Signed-off-by: kerthcet <kerthcet@gmail.com> * polish the layout of the download progress Signed-off-by: kerthcet <kerthcet@gmail.com> * revert change Signed-off-by: kerthcet <kerthcet@gmail.com> * add make format Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com> * remove available mem (InftyAI#22) Signed-off-by: kerthcet <kerthcet@gmail.com> * add speed at the end (InftyAI#23) * add speed at the end Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com> * fix: do no register model once cached (InftyAI#26) Signed-off-by: kerthcet <kerthcet@gmail.com> * Support GPU detect (InftyAI#27) * support GPU detect Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com> * update readme.md (InftyAI#28) Signed-off-by: kerthcet <kerthcet@gmail.com> * Support inspect command (InftyAI#29) * add support for inspect Signed-off-by: kerthcet <kerthcet@gmail.com> * add support for inspect Signed-off-by: kerthcet <kerthcet@gmail.com> * add pull progress bar Signed-off-by: kerthcet <kerthcet@gmail.com> * polish the download progress Signed-off-by: kerthcet <kerthcet@gmail.com> * reorganize the structure Signed-off-by: kerthcet <kerthcet@gmail.com> * optimize the structure Signed-off-by: kerthcet <kerthcet@gmail.com> * fix test Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com> * add metadata Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
* Optimize the commands Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
* support sqllite Signed-off-by: kerthcet <kerthcet@gmail.com> * fix tests Signed-off-by: kerthcet <kerthcet@gmail.com> * address comments Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
* support label filtering Signed-off-by: kerthcet <kerthcet@gmail.com> * add tests Signed-off-by: kerthcet <kerthcet@gmail.com> * better organize the structure Signed-off-by: kerthcet <kerthcet@gmail.com> * fix task type error Signed-off-by: kerthcet <kerthcet@gmail.com> * support all Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
* optimize readme.md Signed-off-by: kerthcet <kerthcet@gmail.com> * update readme.md Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* update the logo Signed-off-by: kerthcet <kerthcet@gmail.com> * update with new logo Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* add server support Signed-off-by: kerthcet <kerthcet@gmail.com> * Add logo to serve Signed-off-by: kerthcet <kerthcet@gmail.com> * fix tests Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> * fix tests Signed-off-by: kerthcet <kerthcet@gmail.com> * fix tests Signed-off-by: kerthcet <kerthcet@gmail.com> * fix test Signed-off-by: kerthcet <kerthcet@gmail.com> * fix lint Signed-off-by: kerthcet <kerthcet@gmail.com> * remove libs Signed-off-by: kerthcet <kerthcet@gmail.com> * fix tests Signed-off-by: kerthcet <kerthcet@gmail.com> * change log lib Signed-off-by: kerthcet <kerthcet@gmail.com> * change the request log level Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
* optimize the modelInfo Signed-off-by: kerthcet <kerthcet@gmail.com> * optimize the layout Signed-off-by: kerthcet <kerthcet@gmail.com> * add alias to providers Signed-off-by: kerthcet <kerthcet@gmail.com> * polish comments Signed-off-by: kerthcet <kerthcet@gmail.com> * polish comments Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
* Add model to serve command Signed-off-by: kerthcet <kerthcet@gmail.com> * fix test Signed-off-by: kerthcet <kerthcet@gmail.com> --------- Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
Signed-off-by: kerthcet <kerthcet@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR significantly expands PUMA from a basic CLI into a more complete local-model runtime by adding: (1) a SQLite-backed model registry + caching utilities, (2) Hugging Face download support with progress, (3) an OpenAI-compatible HTTP API server, and (4) an (early/placeholder) MLX backend gated to macOS + mlx feature.
Changes:
- Introduces a new model registry/storage layer (SQLite) and CLI commands (
pull,ls,inspect,rm,serve,info) that operate on it. - Adds an Axum-based OpenAI-compatible API server (chat + legacy completions + models + health) with streaming support for chat.
- Adds MLX backend scaffolding (feature-gated) plus documentation/examples, along with CI/lint/test targets and refreshed docs.
Reviewed changes
Copilot reviewed 48 out of 52 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/cli_test.rs | Adds CLI integration tests that execute the built binary end-to-end. |
| src/utils/mod.rs | Introduces utils module root. |
| src/utils/format.rs | Adds formatting helpers for sizes, parameters, and relative timestamps (with unit tests). |
| src/utils/file.rs | Adds cache/home path helpers (including PUMA_HOME override) and file utilities. |
| src/util/request.rs | Removes legacy chunked downloader implementation. |
| src/util/mod.rs | Removes old util module root. |
| src/util/file.rs | Removes old home-dir helper (migrated into utils). |
| src/system/system_info.rs | Adds system info collection (OS/arch/mem/GPU/cache stats) and display output. |
| src/system/mod.rs | Introduces system module root. |
| src/storage/storage_trait.rs | Adds ModelStorage trait abstraction for registry backends. |
| src/storage/sqlite.rs | Implements SQLite-backed model storage with migrations + filtering + tests. |
| src/storage/mod.rs | Exposes storage modules and re-exports storage types. |
| src/registry/model_registry.rs | Adds the model registry facade (register/load/get/remove + cache cleanup) and tests. |
| src/registry/mod.rs | Introduces registry module root. |
| src/main.rs | Wires in new modules and switches logging to tracing_subscriber. |
| src/downloader/progress.rs | Adds shared multi-progress management for multi-file downloads. |
| src/downloader/mod.rs | Exposes downloader modules (huggingface + progress). |
| src/downloader/huggingface.rs | Implements Hugging Face snapshot download + metadata fetch + registry registration + tests. |
| src/downloader/downloader.rs | Refactors download errors and introduces Downloader trait. |
| src/cli/serve.rs | Adds serve command to start the HTTP API with backend selection (MLX/mock). |
| src/cli/rm.rs | Adds remove-model CLI logic backed by registry removal. |
| src/cli/mod.rs | Exposes new CLI submodules (inspect/ls/rm/serve). |
| src/cli/ls.rs | Adds list-models logic with regex + SQL-like filter parsing. |
| src/cli/inspect.rs | Adds inspect-model logic and pretty-printing of model metadata. |
| src/cli/commands.rs | Extends clap CLI: args structs, serve command, wired registry-backed commands. |
| src/backend/mod.rs | Introduces backend module root with feature-gated MLX backend. |
| src/backend/mock.rs | Adds mock inference engine implementation for testing/fallback. |
| src/backend/mlx/README.md | Documents MLX backend requirements, usage, and roadmap. |
| src/backend/mlx/mod.rs | Adds MLX backend module with exports. |
| src/backend/mlx/engine.rs | Adds placeholder MLX inference engine implementation + tests. |
| src/backend/engine.rs | Defines the inference engine trait + response type. |
| src/api/types/response.rs | Adds OpenAI-compatible response types (chat + completions + models + errors). |
| src/api/types/request.rs | Adds OpenAI-compatible request types (chat + completions). |
| src/api/types/mod.rs | Re-exports API request/response types. |
| src/api/tests.rs | Adds integration tests for API endpoints including streaming chat SSE. |
| src/api/routes.rs | Creates Axum router, shared state, tracing, and permissive CORS. |
| src/api/models.rs | Implements /v1/models and /v1/models/:model endpoints. |
| src/api/mod.rs | Adds API module root. |
| src/api/completions.rs | Implements legacy /v1/completions endpoint (non-streaming). |
| src/api/chat.rs | Implements /v1/chat/completions (stream + non-stream) handler. |
| site/images/logo-light.svg | Adds a light-theme logo asset. |
| README.md | Major documentation refresh: features, CLI/API usage, project structure, etc. |
| Makefile | Adds test, lint, and format targets. |
| hack/scripts/test_api.sh | Adds manual API testing script using curl/jq. |
| hack/README.md | Documents hack/ scripts usage. |
| examples/mlx_inference.rs | Adds an example program for MLX inference (feature gated). |
| docs/MLX_INTEGRATION.md | Adds detailed MLX integration guide and usage instructions. |
| Cargo.toml | Bumps version and adds new dependencies/features for API/registry/downloader/MLX. |
| .github/workflows/rust-ci.yaml | Adds GitHub Actions workflow for linting and tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+67
to
+73
| if seconds < 0 { | ||
| "just now".to_string() | ||
| } else if seconds < 60 { | ||
| format!("{} seconds ago", seconds) | ||
| } else if seconds < 3600 { | ||
| let minutes = seconds / 60; | ||
| format!( |
| assert_eq!(format_time_ago(×tamp), "30 seconds ago"); | ||
|
|
||
| let timestamp = (now - Duration::seconds(1)).to_rfc3339(); | ||
| assert_eq!(format_time_ago(×tamp), "1 seconds ago"); |
| stdout.contains(text) || stderr.contains(text) | ||
| } | ||
|
|
||
| #[test] |
| assert!(stdout.contains("MODEL")); | ||
| } | ||
|
|
||
| #[test] |
| assert!(result.is_err()); | ||
| } | ||
|
|
||
| #[tokio::test] |
Comment on lines
+1
to
+11
| use serde::{Deserialize, Serialize}; | ||
| use std::fs; | ||
| use std::os::unix::fs::MetadataExt; | ||
| use std::path::PathBuf; | ||
| use std::process::Command; | ||
| use sysinfo::System; | ||
|
|
||
| use crate::registry::model_registry::ModelRegistry; | ||
| use crate::utils::file; | ||
| use crate::utils::format::format_size; | ||
|
|
| if let Ok(metadata) = entry.metadata() { | ||
| if metadata.is_file() { | ||
| // Use blocks * 512 to get actual disk usage (handles sparse files) | ||
| total_size += metadata.blocks() * 512; |
| if let Ok(metadata) = entry.metadata() { | ||
| if metadata.is_file() { | ||
| // Use blocks * 512 to get actual disk usage (handles sparse files) | ||
| total_size += metadata.blocks() * 512; |
| @@ -0,0 +1,185 @@ | |||
| use crate::backend::engine::{GenerateResponse, InferenceEngine}; | |||
| use mlx_rs::{Array, Device, Dtype}; | |||
Comment on lines
+251
to
+252
| // Register the model only if not totally cached | ||
| if !model_totally_cached { |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it
Which issue(s) this PR fixes
Fixes #
Special notes for your reviewer
Does this PR introduce a user-facing change?