Skip to content

Commit 772ff95

Browse files
feat: 5505 Harden Python custom logic block sandbox & extend libraries
* feat: 5504 Add scikit-learn, xarray, geopandas to Python custom logic block Extend supported Python libraries per issue #5504. Added libraries: - scikit-learn: machine learning (classification, regression, clustering) - xarray: labeled multi-dimensional arrays for climate/environmental data - geopandas: geospatial DataFrames with geometry and spatial operations Already available (no install needed): - calendar, datetime, collections, math, copy: Python built-ins - dateutil, six: transitive dependencies of pandas Not available in Pyodide (WASM): - rasterio: depends on GDAL (C/C++ library not compiled to WASM) - rioxarray: depends on rasterio Workaround: pre-process raster data outside the block (e.g. convert GeoTIFF to CSV/JSON) and pass as input documents. * feat: 5505 Harden Python custom logic block sandbox - Replace js module with restricted stub (blocks from js import fetch and all JS bridge access, survives re-import attempts) - Replace pyodide.http with restricted stub (prevents re-import) - Block all os.exec*/os.spawn*/os.system/os.popen functions - Block subprocess.run/call/check_call/check_output/Popen (module remains importable for library compatibility) - Install sys.meta_path import hook to prevent bypassing module restrictions via __import__ or importlib - Remove unnecessary libraries: duckdb, sqlalchemy, bokeh, altair, cartopy, seaborn (matplotlib remains as transitive dep of networkx) * feat: experimental Docker container isolation for Python custom logic block Add Docker-based sandbox for Python code execution in custom logic blocks. Set PYTHON_SANDBOX_MODE=docker to enable (default is Pyodide worker for backward compatibility). Container security (no resource limits — matching develop): - --network=none, --cap-drop=ALL, --security-opt=no-new-privileges - --read-only, --user=1001:1001 (non-root) - --name=python-sandbox-<uuid> (named for cleanup) - --log-driver=none, --pull=never - --tmpfs /tmp:rw,noexec,nosuid,size=64m - Image name validation (regex) Defense-in-depth Python sandbox (both paths): - js/pyodide.http stubs + import hook - builtins.__import__ guarded via closure (hides _original_import) - os.system/exec*/spawn*/popen, subprocess.run/call/Popen blocked - os.environ cleared, importlib.reload blocked - ctypes/cffi/_posixsubprocess import blocked - processLine checks settled before firing callbacks Pyodide worker improvements: - Timeout (PYTHON_SANDBOX_TIMEOUT_MS, default 120s) - worker.on('exit') rejects on non-zero exit code - safeResolve/safeReject prevent double settlement - disposeTables() called on all exit/error paths Docker worker: promise-only errors, settled guards in all callback paths, processLine helper, stdin error handling, done(final=true) tracking, package load failure reporting, non-blocking cleanup. Bug fixes: debug field (data.message->data.result), command injection, disposeTables in error paths, __globals__ bypass, pint removed. * docs: update Python implementation guide with new libraries and sandbox security Update supported libraries list: add scikit-learn, xarray, geopandas. Document removed libraries (duckdb, sqlalchemy, pint, bokeh, altair, cartopy, seaborn) with reasons. Add sandbox security section covering blocked operations and execution modes (Pyodide default, Docker experimental). Document built-in modules and transitive dependencies. * feat: replace Pyodide/WASM with CPython in Docker sandbox, harden Pyodide worker Docker sandbox: replace Node.js + Pyodide (WASM) with native CPython 3.12. Same JSON stdin/stdout protocol — zero changes in host-side Docker worker. Benefits: <1s startup (was 30-60s), ~300MB memory (was 2-4GB), native speed, rasterio/rioxarray now available. Pyodide worker hardening: - Block socket networking functions (socket.socket, create_connection, getaddrinfo, gethostbyname, etc.) - Update import hook to PEP 451 API (find_spec) - Extract shared package list to python-packages.json Both paths: - Accumulate pendingDone promises via array + Promise.all (fixes race where multiple done() calls could lose in-flight work) - Smart JSON serializer for numpy/pandas/datetime types in CPython - Fix DockerCallbacks.onDone type to Promise<void> | void - Remove unused traceback import * docs: comprehensive Python implementation guide with Docker mode and security Update python-implementation-in-guardian.md with: - Library versions for all installed packages - Docker-only libraries (rasterio, rioxarray) - Full Docker mode documentation (setup, benefits, security flags) - Execution modes comparison (Pyodide vs Docker) - Sandbox security details for both modes - Vulnerability comparison table - Configuration reference * fix: prevent unhandled promise rejection when done() throws Add .catch(safeReject) to pendingDones promises so that errors from done() (e.g. invalid output schema) are caught instead of becoming unhandled promise rejections that crash the process. In develop, these errors are caught by the worker message handler's try/catch → reject(). With pendingDones pattern, the promise could reject after the exit handler already resolved, causing a crash. * feat: add python-sandbox config to all docker-compose files, fix Pyodide warmup permissions - Add commented-out python-sandbox service and docker.sock volume to docker-compose.yml, docker-compose-production.yml, docker-compose-production-build.yml, docker-compose-quickstart.yml - Comment out python-sandbox in docker-compose-build.yml for consistency (default mode is Pyodide, docker.sock should not be mounted by default) - Fix EACCES permission denied in Pyodide warmup: chown pyodide dir to node user in policy-service Dockerfile - Update python-implementation-in-guardian.md with all compose files * docs: clarify python-sandbox is an image build, not a service --------- Signed-off-by: nikolay-zezin <Nikolay.Zezin@waveaccess.global>
1 parent 8b7b711 commit 772ff95

17 files changed

Lines changed: 914 additions & 73 deletions

docker-compose-build.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -211,10 +211,19 @@ services:
211211
volumes:
212212
- ./policy-service/tls:/usr/local/app/tls:ro
213213
- ./policy-service/configs:/usr/local/app/configs:ro
214+
# Uncomment for PYTHON_SANDBOX_MODE=docker:
215+
# - /var/run/docker.sock:/var/run/docker.sock:ro
214216
<<: *service-template
215217
expose:
216218
- '5006'
217219

220+
# Uncomment for PYTHON_SANDBOX_MODE=docker:
221+
# python-sandbox:
222+
# build:
223+
# context: ./policy-service/docker/python-sandbox
224+
# dockerfile: Dockerfile
225+
# image: guardian/python-sandbox:latest
226+
218227
prometheus:
219228
image: prom/prometheus:v2.44.0
220229
volumes:

docker-compose-production-build.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -194,10 +194,19 @@ services:
194194
volumes:
195195
- ./policy-service/tls:/usr/local/app/tls:ro
196196
- ./policy-service/configs:/usr/local/app/configs:ro
197+
# Uncomment for PYTHON_SANDBOX_MODE=docker:
198+
# - /var/run/docker.sock:/var/run/docker.sock:ro
197199
<<: *service-template
198200
expose:
199201
- '5006'
200202

203+
# Uncomment for PYTHON_SANDBOX_MODE=docker:
204+
# python-sandbox:
205+
# build:
206+
# context: ./policy-service/docker/python-sandbox
207+
# dockerfile: Dockerfile
208+
# image: guardian/python-sandbox:latest
209+
201210
prometheus:
202211
image: prom/prometheus:v2.44.0
203212
volumes:

docker-compose-production.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,10 +177,19 @@ services:
177177
volumes:
178178
- ./policy-service/tls:/usr/local/app/tls:ro
179179
- ./policy-service/configs:/usr/local/app/configs:ro
180+
# Uncomment for PYTHON_SANDBOX_MODE=docker:
181+
# - /var/run/docker.sock:/var/run/docker.sock:ro
180182
<<: *service-template
181183
expose:
182184
- '5006'
183185

186+
# Uncomment for PYTHON_SANDBOX_MODE=docker:
187+
# python-sandbox:
188+
# build:
189+
# context: ./policy-service/docker/python-sandbox
190+
# dockerfile: Dockerfile
191+
# image: guardian/python-sandbox:latest
192+
184193
prometheus:
185194
image: prom/prometheus:v2.44.0
186195
volumes:

docker-compose-quickstart.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,10 +133,19 @@ services:
133133
volumes:
134134
- ./policy-service/tls:/usr/local/app/tls:ro
135135
- ./policy-service/configs:/usr/local/app/configs:ro
136+
# Uncomment for PYTHON_SANDBOX_MODE=docker:
137+
# - /var/run/docker.sock:/var/run/docker.sock:ro
136138
<<: *service-template
137139
expose:
138140
- '5006'
139141

142+
# Uncomment for PYTHON_SANDBOX_MODE=docker:
143+
# python-sandbox:
144+
# build:
145+
# context: ./policy-service/docker/python-sandbox
146+
# dockerfile: Dockerfile
147+
# image: guardian/python-sandbox:latest
148+
140149
queue-service:
141150
image: gcr.io/hedera-registry/queue-service:${GUARDIAN_VERSION:-latest}
142151
depends_on:

docker-compose.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,10 +198,19 @@ services:
198198
volumes:
199199
- ./policy-service/tls:/usr/local/app/tls:ro
200200
- ./policy-service/configs:/usr/local/app/configs:ro
201+
# Uncomment for PYTHON_SANDBOX_MODE=docker:
202+
# - /var/run/docker.sock:/var/run/docker.sock:ro
201203
<<: *service-template
202204
expose:
203205
- '5006'
204206

207+
# Uncomment for PYTHON_SANDBOX_MODE=docker:
208+
# python-sandbox:
209+
# build:
210+
# context: ./policy-service/docker/python-sandbox
211+
# dockerfile: Dockerfile
212+
# image: guardian/python-sandbox:latest
213+
205214
prometheus:
206215
image: prom/prometheus:v2.44.0
207216
volumes:

docs/guardian/standard-registry/policies/python-implementation-in-guardian.md

Lines changed: 170 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ A new dropdown setting has been added to the Custom Logic block in the Policy Ed
2525

2626
#### Use Case
2727

28-
Choose "Python" when you want to leverage Pythons expressive syntax and advanced computation libraries for policy logic.
28+
Choose "Python" when you want to leverage Python's expressive syntax and advanced computation libraries for policy logic.
2929

3030
### 2. Python Scripting Support
3131

@@ -69,23 +69,172 @@ This field helps track the Guardian system version that was used to generate or
6969
* Python execution is subject to the limitations and security constraints defined in Guardian's runtime.
7070
{% endhint %}
7171

72-
### 4. Supported Python Libraries and its Versions
73-
74-
| Library Name | Version |
75-
| :----------: | :-----: |
76-
| numpy | 1.26.4 |
77-
| scipy | 1.12.0 |
78-
| sympy | 1.12 |
79-
| pandas | 2.2.0 |
80-
| pint | 0.25.1 |
81-
| duckdb | 1.0.0 |
82-
| sqlalchemy | 2.0.29 |
83-
| cftime | 1.6.3 |
84-
| matplotlib | 3.5.2 |
85-
| seaborn | 0.13.2 |
86-
| bokeh | 3.4.1 |
87-
| altair | 5.3.0 |
88-
| cartopy | 0.23.0 |
89-
| astropy | 6.0.1 |
90-
| statsmodels | 0.14.2 |
91-
| networkx | 3.3 |
72+
### 4. Supported Python Libraries
73+
74+
#### Installed Libraries
75+
76+
| Library Name | Import Name | Version |
77+
| :----------: | :---------: | :-----: |
78+
| numpy | `numpy` | 1.26.4 |
79+
| scipy | `scipy` | 1.12.0 |
80+
| sympy | `sympy` | 1.12 |
81+
| pandas | `pandas` | 2.2.0 |
82+
| pint | `pint` | 0.25.3 |
83+
| cftime | `cftime` | 1.6.3 |
84+
| astropy | `astropy` | 6.0.1 |
85+
| statsmodels | `statsmodels` | 0.14.2 |
86+
| networkx | `networkx` | 3.3 |
87+
| scikit-learn | `sklearn` | 1.4.2 |
88+
| xarray | `xarray` | 2024.3.0 |
89+
| geopandas | `geopandas` | 0.14.3 |
90+
91+
{% hint style="info" %}
92+
Library versions listed are for the default Pyodide mode. Docker mode may have newer versions as it uses native CPython with pip.
93+
{% endhint %}
94+
95+
#### Docker-Only Libraries
96+
97+
These libraries require native C/C++ dependencies (GDAL) and are only available in Docker mode:
98+
99+
| Library Name | Import Name | Purpose |
100+
| :----------: | :---------: | :------ |
101+
| rasterio | `rasterio` | Read/write raster geospatial data (GeoTIFF, satellite imagery) |
102+
| rioxarray | `rioxarray` | Bridge between xarray and rasterio — CRS management, reprojection |
103+
104+
#### Python Built-in Modules (always available)
105+
106+
| Module | Purpose |
107+
| :----: | :------ |
108+
| `calendar` | Calendar rendering, weekday calculations |
109+
| `datetime` | Date/time types and arithmetic |
110+
| `collections` | OrderedDict, Counter, defaultdict, namedtuple |
111+
| `math` | Basic math functions (sin, log, sqrt, pi) |
112+
| `copy` | Deep/shallow copy of objects |
113+
114+
#### Available as Transitive Dependencies (no explicit install needed)
115+
116+
| Library | Import Name | Purpose | Installed via |
117+
| :-----: | :---------: | :------ | :------------ |
118+
| python-dateutil | `dateutil` | Smart date parsing, relative deltas | pandas |
119+
| six | `six` | Python 2/3 compatibility | pandas → python-dateutil |
120+
| matplotlib | `matplotlib` | Data visualization | networkx (transitive) |
121+
122+
#### Removed Libraries (Issue #5505)
123+
124+
The following libraries were removed as part of sandbox hardening. They are unnecessary for computation — their data processing features are covered by pandas, and they were designed to work with external resources (databases, networks, web servers) that are not available in the sandbox.
125+
126+
| Library | Reason for Removal |
127+
| :-----: | :----------------- |
128+
| duckdb | SQL database engine; covered by pandas |
129+
| sqlalchemy | SQL toolkit/ORM; covered by pandas |
130+
| bokeh | Visualization; unnecessary for computation |
131+
| altair | Visualization; unnecessary for computation |
132+
| cartopy | Map visualization; unnecessary for computation |
133+
| seaborn | Visualization; unnecessary for computation |
134+
135+
### 5. Execution Modes
136+
137+
Guardian supports two execution modes for Python custom logic blocks, controlled by the `PYTHON_SANDBOX_MODE` environment variable.
138+
139+
#### Pyodide Mode (default)
140+
141+
The default mode runs Python code using Pyodide (CPython compiled to WebAssembly) inside a Node.js Worker Thread.
142+
143+
* **No additional infrastructure required** — works out of the box
144+
* **Startup:** packages are pre-cached at policy-service startup for faster execution
145+
* **Limitation:** some C-extension packages (rasterio, rioxarray) are unavailable in WASM
146+
147+
**Configuration:** No env var needed (default), or explicitly set `PYTHON_SANDBOX_MODE=pyodide`
148+
149+
#### Docker Mode (experimental)
150+
151+
Runs Python code in an ephemeral Docker container using native CPython 3.12. Provides OS-level isolation.
152+
153+
**Container security flags:**
154+
155+
| Flag | Purpose |
156+
| :--- | :------ |
157+
| `--network=none` | All network access blocked |
158+
| `--cap-drop=ALL` | No Linux capabilities |
159+
| `--security-opt=no-new-privileges` | Prevent privilege escalation |
160+
| `--read-only` | Read-only root filesystem |
161+
| `--user=1001:1001` | Non-root execution |
162+
| `--log-driver=none` | No container log storage |
163+
| `--pull=never` | Never pull untrusted images |
164+
| `--tmpfs /tmp` | Writable scratch space (noexec, destroyed on exit) |
165+
166+
**Setup:**
167+
168+
1. Build the sandbox image:
169+
```bash
170+
docker buildx build -t guardian/python-sandbox:latest policy-service/docker/python-sandbox
171+
```
172+
Or via docker-compose:
173+
```bash
174+
docker compose -f docker-compose-build.yml build python-sandbox
175+
```
176+
177+
2. Set the environment variable in policy-service configuration:
178+
```
179+
PYTHON_SANDBOX_MODE=docker
180+
```
181+
182+
3. Ensure the policy-service container has Docker socket access. For docker-compose deployments, uncomment the Docker socket volume mount and the `python-sandbox` image build definition in the relevant compose file:
183+
- `docker-compose-build.yml`, `docker-compose.yml`, `docker-compose-production.yml`, `docker-compose-production-build.yml`, `docker-compose-quickstart.yml` — uncomment the Docker socket volume and `python-sandbox` image build
184+
185+
{% hint style="warning" %}
186+
Docker mode requires the Docker daemon to be available. The policy-service needs access to the Docker socket to spawn sandbox containers. For production deployments, consider using a Docker API proxy to restrict operations to sandbox container management only.
187+
{% endhint %}
188+
189+
### 6. Sandbox Security
190+
191+
Python code in custom logic blocks runs in a sandboxed environment. The following restrictions are enforced:
192+
193+
#### Pyodide Mode Restrictions
194+
195+
| Restriction | Details |
196+
| :---------- | :----- |
197+
| JavaScript bridge (`from js import ...`) | Blocked via module stub + import hook |
198+
| `pyodide.http` network access | Blocked via module stub + import hook |
199+
| `os.system`, `os.popen`, `os.exec*`, `os.spawn*` | All replaced with blocked function |
200+
| `subprocess.run`, `subprocess.Popen` | All execution functions replaced |
201+
| `socket.socket`, `socket.connect` | All networking functions replaced |
202+
| `os.environ` (secrets) | Cleared on startup (only HOME/PATH kept) |
203+
| `importlib.reload` | Blocked to prevent undoing patches |
204+
| `builtins.__import__` | Guarded via closure to prevent bypass |
205+
| Execution timeout | Configurable via `PYTHON_SANDBOX_TIMEOUT_MS` (default 120s) |
206+
207+
#### Docker Mode Restrictions
208+
209+
All restrictions above are provided by Docker container isolation:
210+
211+
* **Network:** `--network=none` blocks all connections (verified: HTTP requests fail)
212+
* **File system:** `--read-only` + no host mounts — container sees only its own minimal filesystem
213+
* **Processes:** commands run inside isolated container only, destroyed after execution
214+
* **Environment:** `os.environ` cleared before user code runs
215+
* **Resources:** container destroyed with `--rm` after each execution
216+
217+
#### Vulnerability Comparison
218+
219+
| Attack Vector | Pyodide Mode | Docker Mode |
220+
| :------------ | :----------- | :---------- |
221+
| Network requests | Blocked (Python-level) | Blocked (OS-level `--network=none`) |
222+
| Host filesystem access | Blocked (WASM virtual FS) | Blocked (`--read-only`, no mounts) |
223+
| Process execution | Blocked (functions replaced) | Runs inside isolated container |
224+
| `os.environ` secrets | Cleared | Cleared + container has own env |
225+
| `ctypes` C function calls | Not blocked (needed by pandas, harmless in WASM) | Runs inside isolated container |
226+
| Python introspection bypass | Possible (known limitation) | Irrelevant — container is isolated |
227+
| Memory/CPU exhaustion | Timeout only | Timeout + container destroyed |
228+
229+
{% hint style="info" %}
230+
* **Pyodide mode** is suitable when users are trusted or semi-trusted. It blocks common attack vectors but is vulnerable to sophisticated Python introspection attacks.
231+
* **Docker mode** is suitable for untrusted code. OS-level isolation makes Python-level bypasses irrelevant — the container has no network, no host access, and is destroyed after execution.
232+
{% endhint %}
233+
234+
### 7. Configuration Reference
235+
236+
| Environment Variable | Default | Description |
237+
| :------------------- | :------ | :---------- |
238+
| `PYTHON_SANDBOX_MODE` | `pyodide` | Execution mode: `pyodide` (default) or `docker` |
239+
| `PYTHON_SANDBOX_TIMEOUT_MS` | `120000` | Execution timeout in milliseconds (both modes) |
240+
| `PYTHON_SANDBOX_IMAGE` | `guardian/python-sandbox:latest` | Docker sandbox image name (Docker mode only) |

policy-service/Dockerfile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,9 @@ COPY --link --from=deps /usr/local/app/node_modules node_modules/
5454
COPY --link --from=deps /usr/local/app/package.json ./
5555
COPY --link --from=build /usr/local/app/dist dist/
5656

57+
# Allow node user to write Pyodide package cache (warmup downloads wheels at startup)
58+
RUN chown -R node:node node_modules/pyodide/ 2>/dev/null || true
59+
5760
# Change the user to node
5861
USER node
5962

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
node_modules
2+
npm-debug.log
3+
.git
4+
.gitignore
5+
README.md
6+
*.md
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
FROM python:3.12.10-slim
2+
3+
WORKDIR /sandbox
4+
5+
# Install system dependencies for geospatial libraries
6+
RUN apt-get update && apt-get install -y --no-install-recommends \
7+
gdal-bin libgdal-dev \
8+
&& rm -rf /var/lib/apt/lists/*
9+
10+
# Install Python packages (pinned to major.minor for reproducibility)
11+
COPY requirements.txt ./
12+
RUN pip install --no-cache-dir -r requirements.txt
13+
14+
# Copy entrypoint
15+
COPY entrypoint.py ./
16+
17+
# Create non-root user and fix permissions
18+
RUN adduser --disabled-password --uid 1001 sandbox && chown -R sandbox:sandbox /sandbox
19+
USER sandbox
20+
21+
ENTRYPOINT ["python3", "entrypoint.py"]

0 commit comments

Comments
 (0)