Skip to content

Migration worker metadata-service discovery#4222

Open
vpetrovicTT wants to merge 7 commits into
mainfrom
vpetrovic/feat/migration-worker-discovery
Open

Migration worker metadata-service discovery#4222
vpetrovicTT wants to merge 7 commits into
mainfrom
vpetrovic/feat/migration-worker-discovery

Conversation

@vpetrovicTT

@vpetrovicTT vpetrovicTT commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Setup (per host, before running the binary)

On each host (I used c02u02 and c02u08), export the runtime libs first:

export TT_METAL_HOME=/data/vpetrovic/tt-metal
export LD_LIBRARY_PATH=/data/vpetrovic/.local/lib:$TT_METAL_HOME/build/lib:$TT_METAL_HOME/build_Release/lib:$LD_LIBRARY_PATH
cd /data/vpetrovic/tt-inference-server/tt-media-server/cpp_server

Step 1 — metadata service (background, on the metadata host c02u02 / 10.32.26.193)

This is the Metadata service:

HTTP_PORT=18080 BIND_HOST=0.0.0.0 \
  tests/integration/run_mooncake_metadata_server.sh &
# serves http://10.32.26.193:18080/metadata

Step 2 — receiver (on c02u08 / 10.32.26.225)

Advertises its own IP, registers kv-receiver-0, then waits:

MC_TCP_BIND_ADDRESS=10.32.26.225 build/migration_worker_discovery --role receiver \
  --metadata http://10.32.26.193:18080/metadata --name kv-receiver-0 \
  --bytes 1048576 --timeout-sec 300

Step 3 — sender (on c02u02 / 10.32.26.193)

Looks up kv-receiver-0, then ships the tensor:

build/migration_worker_discovery --role sender \
  --metadata http://10.32.26.193:18080/metadata --name kv-sender-0 \
  --peer kv-receiver-0 --bytes 1048576 --timeout-sec 300

What you're looking for

Process Host Success line
metadata service c02u02 (just stays up, serving :18080)
receiver c02u08 verifyTensorOnReceiver(... 1048576 ...) -> MATCHverification: PASS
sender c02u02 discovered peer 'kv-receiver-0'done: transferred 1048576 bytes

@vpetrovicTT vpetrovicTT changed the title Add metadata-service discovery PoC Migration worker metadata-service discovery Jun 15, 2026
@vpetrovicTT vpetrovicTT linked an issue Jun 16, 2026 that may be closed by this pull request
@vpetrovicTT

Copy link
Copy Markdown
Collaborator Author

Sender

Screenshot 2026-06-16 at 11 00 25

Reciever

Screenshot 2026-06-16 at 10 57 30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migration worker discovery mechanism

4 participants