Scope a migration toward a Rust-native runtime (Python stays the model layer)

**Difficulty:** 🟣 Research / open-ended — **this issue is to produce a scoping doc / RFC (Request for Comments — a written proposal circulated for feedback before implementation), not the port itself.**

**Scope:** Very large as a program of work; this issue's deliverable is a written, reviewed plan with a recommended ordering and the key design decisions called out.

**Subsystems (in likely migration order):** [graph/](https://github.com/mstar-project/mstar/blob/main/mstar/graph/) · [communication/](https://github.com/mstar-project/mstar/blob/main/mstar/communication/) · [conductor/](https://github.com/mstar-project/mstar/blob/main/mstar/conductor/conductor.py) · [api_server/](https://github.com/mstar-project/mstar/blob/main/mstar/api_server/) · [worker/](https://github.com/mstar-project/mstar/blob/main/mstar/worker/worker.py)

**Prerequisites:** Rust; a Python↔Rust interop story (e.g., PyO3 built with maturin or equivalent); comfort designing a language-neutral wire format.

### Goal

We want **as much of the M\* runtime as possible running in Rust** — for raw
performance and, just as importantly, to get the scheduling and communication
hot paths off the Python GIL. As a hard constraint, **model code stays Python.**
Model authors will still write their submodules, model code, and `forward` passes in Python with torch. So every Rust boundary we introduce has to preserve the existing Python model-authoring API (with reasonable changes as needed).

This issue is to **scope that migration**: take the rough phasing below, validate
it against the code, and turn it into a concrete RFC with recommended ordering, the
wire-format decision, the in-process-vs-separate-process call per component, and
an assessment of what realistically *can't* move.

### Rough phasing to react to (maintainer's initial thinking)

This is a starting point, not a spec — the point of the issue is to pressure-test
and refine it.

1. **Graph layer first.** The graph logic, ready-queues, and edge/ingest
   bookkeeping ([mstar/graph/](https://github.com/mstar-project/mstar/blob/main/mstar/graph/) — `GraphNode`, `Sequential`,
   `Parallel`, `Loop`, `GraphEdge`) is self-contained dataflow with no GPU or
   Python-model dependency. It's a strong first Rust candidate and can be
   "plugged into" the existing worker behind the current Python API. Validate
   against the existing graph tests
   ([test/modular/test_graph.py](https://github.com/mstar-project/mstar/blob/main/test/modular/test_graph.py)).
2. **ZMQ communication stack** ([mstar/communication/communicator.py](https://github.com/mstar-project/mstar/blob/main/mstar/communication/communicator.py),
   the `BaseCommunicator` interface). Natural Rust boundary; Rust has solid ZMQ
   bindings. The **tensor** communication stack
   ([mstar/communication/tensors.py](https://github.com/mstar-project/mstar/blob/main/mstar/communication/tensors.py)) could
   follow *if it makes sense*, but it's harder (see notes below).
3. **Conductor loop as a Rust process.** The conductor's main loop
   (`Conductor.run()` in [conductor.py](https://github.com/mstar-project/mstar/blob/main/mstar/conductor/conductor.py)) is much
   smaller and simpler than the worker's because it's essentially a poll-messages →
   dispatch → sleep loop. Low risk, modest reward: a good vehicle for learning
   how to stand up a Rust component in the live system before tackling anything
   hot.
4. **API server.** Also relatively self-contained
   ([mstar/api_server/](https://github.com/mstar-project/mstar/blob/main/mstar/api_server/)). It's a Python process with two
   threads, so it's potentially latency-critical, especially under load, i.e., when
   there's a lot of data flowing in/out, or when requests are individually very
   fast, per-request Python overhead can dominate. A Rust HTTP front-end could be
   a real win there.
5. **Worker loop — scope only, don't commit.** `Worker.run()`
   ([worker.py](https://github.com/mstar-project/mstar/blob/main/mstar/worker/worker.py)) is the hard one: async scheduling, the
   dedicated GPU thread, speculative scheduling, FlashInfer attention planning,
   CUDA-graph capture/replay. It's unclear how much of this can run in Rust at
   all (it's tangled with torch/flashinfer/CUDA), but individual components
   almost certainly can. This needs very careful carve-out and is its own
   sub-investigation.

### Observations from the current code (fold these into the RFC)

- **Pick the wire/message format early; it's a prerequisite for most things.** Today,
  messages are rich Python objects (e.g. `WorkerMessage` / `ConductorMessage` in
  `mstar/utils/ipc_format`) sent over ZMQ. The moment one end is Rust, the
  payload needs a language-neutral, schema'd encoding (msgpack / protobuf /
  flatbuffers / bincode-with-shared-schema). This decision blocks points 2-4 above and
  should be made first, ideally so Python-to-Python keeps working unchanged
  during the transition.
- **Decide in-process (PyO3) vs separate-process (IPC) per component.** They're
  different tools: the conductor and API server are *already* separate processes
  talking over the bus, so rewriting them as standalone Rust processes avoids
  PyO3 entirely. The graph layer, by contrast, lives *inside* the worker, so it
  wants in-process PyO3 bindings. Name this fork explicitly for each phase.
- **The GIL is the real prize.** A big reason several Python threads exist in the
  worker today (the plan-executor thread, the `MSTAR_PY_SWITCH_INTERVAL_SEC`
  tuning) is GIL contention between scheduling/planning and GPU submission. Rust
  components that release the GIL while they work are where the wins come from.
- **Tensor comm is genuinely harder than ZMQ comm.** The tensor transport
  ([tensors.py](https://github.com/mstar-project/mstar/blob/main/mstar/communication/tensors.py)) is *already* partly native:
  RDMA/TCP go through Mooncake's `TransferEngine` (C++), wrapped behind the
  `TensorTransferEngine` ABC, and the orchestration touches torch tensors, CUDA
  streams, and raw `data_ptr`s. Porting the orchestration to Rust means careful
  torch/CUDA FFI, so this is reasonable to defer or leave partly/mostly Python.

### Open questions for the RFC

- For each component: in-process PyO3 or separate Rust process?
- One wire format for everything, or different formats for control messages vs.
  tensor metadata?
- How do we keep `main` releasable throughout — i.e. land each phase behind the
  existing Python interfaces, with a Python fallback, rather than a flag day?
- What's explicitly staying Python forever (model `forward`, tokenization, media
  decode, anything torch/CUDA-bound)?

### Deliverable / acceptance criteria

- A reviewed RFC that: confirms or revises the phase ordering above; picks the
  wire format; makes the in-process-vs-process call per component; lists what
  stays Python; and covers build integration (maturin alongside the existing
  [pyproject.toml](https://github.com/mstar-project/mstar/blob/main/pyproject.toml)).
- Agreement that **graph** and the **ZMQ communication** layer are the first two
  ports, each shippable behind its current Python interface.

> _New to M\*? Skim [How it works](https://github.com/mstar-project/mstar/blob/main/README.md#how-it-works) and the [Contributing guide](https://github.com/mstar-project/mstar/blob/main/CONTRIBUTING.md) first._


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scope a migration toward a Rust-native runtime (Python stays the model layer) #130

Goal

Rough phasing to react to (maintainer's initial thinking)

Observations from the current code (fold these into the RFC)

Open questions for the RFC

Deliverable / acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Scope a migration toward a Rust-native runtime (Python stays the model layer) #130

Description

Goal

Rough phasing to react to (maintainer's initial thinking)

Observations from the current code (fold these into the RFC)

Open questions for the RFC

Deliverable / acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions