-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Epic Summary
This epic aims to establish a high-performance, resource-efficient framework for fine-tuning, Reinforcement Learning from Human Feedback (RLHF), and executing large, open-source quantized models on standard CPUs. The core strategy involves converting the Python-based Unsloth and C++-based llama.cpp projects into Rust and compiling them to WebAssembly (WASM). This will leverage the performance and safety of Rust alongside the portability of WASM. The project will build upon the conversion methodology previously used for kimi-k2 (as seen in issue #10), with the goal of significantly reducing the memory and processing footprint of state-of-the-art models, making them accessible without specialized GPU hardware.
Objectives
[ ] Develop a Rust-based conversion of the Unsloth library to enable efficient fine-tuning and memory optimization of LLMs.
[ ] Convert the core functionalities of llama.cpp into a Rust library to create a high-performance inference engine for GGUF models.
[ ] Compile both the Unsloth and llama.cpp Rust conversions into WASM modules for cross-platform and browser-based execution.
[ ] Benchmark the performance and resource consumption of the resulting Rust/WASM modules against the original Python and C++ implementations.
[ ] Validate the quality and integrity of the models after quantization and execution through the new framework, ensuring no significant degradation in performance.
Acceptance Criteria
[ ] The Unsloth Rust/WASM module can successfully load and apply Unsloth's Dynamic 2.0 GGUFs to a compatible model.
[ ] The llama.cpp Rust/WASM module can load and run inference on a quantized model (e.g., Llama 3, Mistral).
[ ] The combined framework demonstrates a measurable reduction in RAM and CPU usage (targeting a >10% improvement over the already optimized Unsloth) when running a standard benchmark model.
[ ] The final WASM modules can be successfully executed in a web browser and a standalone server environment (e.g., Node.js).
[ ] Comprehensive documentation is created for using the new Rust/WASM libraries and their APIs.
User Stories
As a developer, I want to fine-tune and run powerful LLMs on my local machine's CPU so that I can develop and experiment with AI applications without incurring high costs for GPU infrastructure.
As a researcher, I want a lightweight and portable framework to test new quantization and optimization techniques so that I can accelerate my research cycles.
As a web developer, I want to integrate LLM functionalities directly into my web applications that run on the client-side, so that I can build more interactive and private user experiences.
Technical Requirements
[ ] Utilize the ruv-FANN library and the established Rust-WASM conversion pipeline.
[ ] Ensure the Rust code is idiomatic, well-documented, and maintains memory safety.
[ ] The project must use stable versions of Rust and associated tooling (Cargo, wasm-pack).
[ ] Create bindings and a clear API for the WASM modules to be easily used from JavaScript.
[ ] Set up a CI/CD pipeline for automated testing and building of the Rust and WASM artifacts.
Dependencies
[ ] [Synaptic-Mesh/issues/10]: Relies on the learnings and architecture from the kimi-k2 Rust-WASM conversion.
[ ] UnslothAI/unsloth: The source project for memory optimization and fine-tuning logic.
[ ] ggml-org/llama.cpp: The source project for the C++ inference engine.
Implementation Plan
Phase 1: Core Conversion and Initial Testing
[ ] Task 1: Set up the Rust workspace and project structure for both unsloth-rs and llama-cpp-rs.
[ ] Task 2: Begin the line-by-line translation of the core data structures and algorithms from Unsloth's Python code to Rust.
[ ] Task 3: Translate the essential model loading and inference logic from llama.cpp to Rust.
[ ] Task 4: Develop initial unit tests for the converted Rust components to ensure functional parity with the originals.
[ ] Task 5: Compile the initial Rust libraries to native binaries and perform preliminary benchmarks.
Phase 2: WASM Compilation and Integration
[ ] Task 1: Configure wasm-pack to compile the Rust libraries into WASM modules.
[ ] Task 2: Develop JavaScript bindings and a simple web-based application to test the WASM modules.
[ ] Task 3: Implement the logic to pass model data and prompts between JavaScript and the WASM modules.
[ ] Task 4: Integrate the unsloth-rs and llama-cpp-rs modules so they can work in concert.
Phase 3: Benchmarking, Optimization, and Documentation
[ ] Task 1: Design and execute a comprehensive benchmarking suite to compare performance (speed, memory) against the original libraries.
[ ] Task 2: Analyze benchmark results and identify bottlenecks for further optimization in the Rust/WASM code.
[ ] Task 3: Create detailed API documentation, tutorials, and usage examples for the new framework.
[ ] Task 4: Write a final report summarizing the project's findings, including the final resource reduction metrics.
Testing Strategy
[ ] Unit Tests: Each Rust module and function will have corresponding unit tests to validate its correctness in isolation.
[ ] Integration Tests: Tests will be created to ensure that the unsloth-rs and llama-cpp-rs libraries work together as expected.
[ ] End-to-End (E2E) Tests: A full workflow will be tested: loading a model, applying Unsloth optimizations, and running inference to check the final output against an expected result. This will be done in both a native Rust environment and a browser-based WASM environment.
Definition of Done
[ ] All acceptance criteria have been met and verified.
[ ] The full suite of unit, integration, and E2E tests are passing consistently.
[ ] The project documentation is complete and has been published.
[ ] The code has been thoroughly reviewed and approved by at least two other contributors.
[ ] The final Rust libraries and WASM modules are published to an appropriate registry (e.g., Crates.io, npm).