feat(docker): add Thor (Jetson) Docker image with inference + inference-ros targets by kingb · Pull Request #106 · amazon-far/holosoma

kingb · 2026-04-28T23:04:48Z

Summary

Add docker/thor/ with two multi-stage parallel targets for running holosoma_inference on Jetson Thor (JetPack 7.1, Ubuntu 24.04 Noble, CUDA 13, aarch64 SBSA):

Target	Includes	Use case
`inference`	policy + unitree_sdk2, no ROS	Joystick / keyboard input, self-contained inference
`inference-ros`	policy + unitree_sdk2 + ROS 2 Jazzy	`Ros2Input` — subscribe to `/cmd_vel` from any ROS publisher.

Also adds docker/thor/compose.yaml (with runtime flags + env preset), docker/thor/.env.example, docker/thor/README.md, docker/thor/Makefile (scoped: make inference, make inference-ros, make run-inference ARGS=...), and docker/thor/scripts/run_*.sh launch helpers for common input-mode combinations.

Layer strategy

Stable → volatile, so code edits only rebuild the last layer:

l4t-cuda (CUDA 13 devel, Ubuntu 24.04)           ← never
 └─ python-base (python3.12, build tools, uv)    ← ~never
     │
     ├─ long-deps (NVPL, cuDSS, TensorRT libs)   ← ~never
     │   └─ common-deps (pydantic, scipy, pin…)  ← weekly-ish
     │       └─ app-deps (unitree_sdk2 + src)    ← every commit
     │           └─ inference                    ← terminal
     │
     └─ ros-jazzy (ros-base + FastDDS + CycloneDDS RMWs)  ← ~never
         └─ long-deps-ros (same as long-deps)             ← ~never
             └─ common-deps-ros (same as common-deps)     ← weekly-ish
                 └─ app-deps-ros (unitree_sdk2 + src)     ← every commit
                     └─ inference-ros                     ← terminal

The branches duplicate long-deps → app-deps because ROS 2 install mutates apt state; heavy layers still cache independently per branch, so day-to-day code edits don't rebuild TensorRT or ROS install.

DDS coexistence (`inference-ros` target)

unitree_sdk2's pybind11 wheel bundles CycloneDDS 0.10.2, which is ABI-incompatible with Jazzy's CycloneDDS 0.10.5 (C++ template signatures changed — free(): invalid pointer on participant init). Fix:

rclpy uses FastDDS (RMW_IMPLEMENTATION=rmw_fastrtps_cpp, set as image ENV default).
unitree_sdk2 keeps its bundled CycloneDDS 0.10.2.
FastDDS and CycloneDDS have disjoint binary symbol spaces → coexist in one process.
Entrypoint prepends /opt/venv/.../unitree_interface/ to LD_LIBRARY_PATH after sourcing ROS so Jazzy's libddsc doesn't hijack unitree's runtime lookup.

Cross-vendor DDS interop (CycloneDDS publisher → FastDDS subscriber) is hardware-validated for TwistStamped on /cmd_vel when both sides are on the same ROS 2 distro (Jazzy-to-Jazzy tested). See scripts/run_shuttle_publisher_cyclonedds.sh for the test.

Base image / pinning

Base: nvcr.io/nvidia/cuda:13.0.2-devel-ubuntu24.04 (matches JetPack 7.1).
Python: 3.12 (Noble native).
ROS 2: Jazzy from packages.ros.org.
unitree_sdk2: 0.1.3 via ARG UNITREE_SDK2_VERSION=0.1.3, fetched from github.com/amazon-far/unitree_sdk2 release assets.
Related: depends on the setup.py cp tag fix PR for native installs (Docker fetches the wheel directly, so technically not a blocker).

Scope

Not included in this first pass:

ZED SDK / pyzed — deferred; not needed for the blind-locomotion policy target.
PyTorch — ONNX Runtime handles policy inference; torch would add ~3 GB for no benefit here.
Any depth-perception / image-server pipeline — out of scope for this policy target.

Known tradeoff: Python deps duplicated between Dockerfile and setup.py

The common-deps stage duplicates the install_requires list from src/holosoma_inference/setup.py verbatim. Intentional — buys layered caching so day-to-day code edits only invalidate the final app-deps layer (~seconds) instead of re-running pip install (~30-60 s on aarch64). setup.py remains the source of truth; the Dockerfile comment calls this out as drift risk.

Alternatives considered and rejected for v1:

uv pip install -e .[unitree,booster] in a single stage — simpler, but every code change re-installs ~20 packages.
Call scripts/setup_inference_via_uv.sh inside the build — same single-layer cache cost, plus the script does laptop-oriented things (Ubuntu detection, sudo nvpmodel, etc.) that don't apply in a container.

Happy to switch if reviewers prefer single-source-of-truth over cache speed.

Test plan

docker build --target inference on Jetson Thor (aarch64 native) — builds cleanly.
docker build --target inference-ros — builds cleanly.
docker compose run --rm inference --help + inference-ros --help both start run_policy.py, register policy configs, no import errors.
run_joystick.sh (joystick+joystick) on real G1 — robot walks responsively.
run_ros2_joystick.sh + run_shuttle_publisher.sh (both FastDDS) on real G1 — robot shuttles forward/back correctly.
run_ros2_joystick.sh + CycloneDDS-based shuttle publisher on real G1 — robot shuttles. Cross-vendor DDS works.

Image sizes

inference: ~17.9 GB uncompressed (~6.85 GB compressed on disk).
inference-ros: ~18.4 GB uncompressed (~6.95 GB compressed).

Size is dominated by NVIDIA TensorRT runtime libs (~2.3 GB) and CUDA devel base — unavoidable for the platform.

…ce-ros targets Adds docker/thor/Dockerfile with two multi-stage parallel branches: - `inference`: no-ROS policy + unitree_sdk2. Smaller image for joystick/keyboard input or users bringing their own velocity source. - `inference-ros`: adds ros-jazzy-ros-base + rmw-cyclonedds-cpp for Ros2Input (subscribing to /cmd_vel from Nav2, for example). Layer ordering optimizes for cache hits on code edits: CUDA base → python-base → long-deps (NVPL/cuDSS/TensorRT) → common-deps (pinocchio/scipy/etc.) → app-deps (unitree wheel + COPY src) → terminal. Source is COPY'd last so day-to-day changes only rebuild the final layers. Platform: Jetson Thor, JetPack 7.1, Ubuntu 24.04 aarch64, CUDA 13. Base image: nvcr.io/nvidia/cuda:13.0.2-devel-ubuntu24.04. Includes: - docker/thor/compose.yaml — Docker Compose with runtime flags preset (--runtime nvidia, host net/ipc, --privileged, CycloneDDS env for the ROS target) - docker/thor/.env.example — MODEL_PATH override - docker/thor/README.md — build/run commands, layer diagram, troubleshooting - docker/thor/Makefile — scoped shortcuts: `make inference`, `make run-inference ARGS='...'` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kingb requested a review from tomasz-lewicki April 28, 2026 23:04

kingb changed the title ~~feat(docker): add Thor (Jetson AGX) Docker image with inference + inference-ros targets~~ feat(docker): add Thor (Jetson) Docker image with inference + inference-ros targets Apr 28, 2026

kingb force-pushed the dev/kingbrnd/thor-docker-inference branch from 5a7033c to 56c70a2 Compare April 28, 2026 23:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(docker): add Thor (Jetson) Docker image with inference + inference-ros targets#106

feat(docker): add Thor (Jetson) Docker image with inference + inference-ros targets#106
kingb wants to merge 1 commit into
amazon-far:mainfrom
kingb:dev/kingbrnd/thor-docker-inference

kingb commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kingb commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Layer strategy

DDS coexistence (inference-ros target)

Base image / pinning

Scope

Known tradeoff: Python deps duplicated between Dockerfile and setup.py

Test plan

Image sizes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kingb commented Apr 28, 2026 •

edited

Loading

DDS coexistence (`inference-ros` target)