feat: 5505 Harden Python custom logic block sandbox & extend libraries#5873
Open
nikolay-zezin wants to merge 7 commits intohashgraph:developfrom
Open
feat: 5505 Harden Python custom logic block sandbox & extend libraries#5873nikolay-zezin wants to merge 7 commits intohashgraph:developfrom
nikolay-zezin wants to merge 7 commits intohashgraph:developfrom
Conversation
0b1a33c to
23c2f53
Compare
23c2f53 to
97577f5
Compare
… block Extend supported Python libraries per issue hashgraph#5504. Added libraries: - scikit-learn: machine learning (classification, regression, clustering) - xarray: labeled multi-dimensional arrays for climate/environmental data - geopandas: geospatial DataFrames with geometry and spatial operations Already available (no install needed): - calendar, datetime, collections, math, copy: Python built-ins - dateutil, six: transitive dependencies of pandas Not available in Pyodide (WASM): - rasterio: depends on GDAL (C/C++ library not compiled to WASM) - rioxarray: depends on rasterio Workaround: pre-process raster data outside the block (e.g. convert GeoTIFF to CSV/JSON) and pass as input documents. Signed-off-by: nikolay-zezin <Nikolay.Zezin@waveaccess.global>
- Replace js module with restricted stub (blocks from js import fetch and all JS bridge access, survives re-import attempts) - Replace pyodide.http with restricted stub (prevents re-import) - Block all os.exec*/os.spawn*/os.system/os.popen functions - Block subprocess.run/call/check_call/check_output/Popen (module remains importable for library compatibility) - Install sys.meta_path import hook to prevent bypassing module restrictions via __import__ or importlib - Remove unnecessary libraries: duckdb, sqlalchemy, bokeh, altair, cartopy, seaborn (matplotlib remains as transitive dep of networkx) Signed-off-by: nikolay-zezin <Nikolay.Zezin@waveaccess.global>
… block
Add Docker-based sandbox for Python code execution in custom logic
blocks. Set PYTHON_SANDBOX_MODE=docker to enable (default is Pyodide
worker for backward compatibility).
Container security (no resource limits — matching develop):
- --network=none, --cap-drop=ALL, --security-opt=no-new-privileges
- --read-only, --user=1001:1001 (non-root)
- --name=python-sandbox-<uuid> (named for cleanup)
- --log-driver=none, --pull=never
- --tmpfs /tmp:rw,noexec,nosuid,size=64m
- Image name validation (regex)
Defense-in-depth Python sandbox (both paths):
- js/pyodide.http stubs + import hook
- builtins.__import__ guarded via closure (hides _original_import)
- os.system/exec*/spawn*/popen, subprocess.run/call/Popen blocked
- os.environ cleared, importlib.reload blocked
- ctypes/cffi/_posixsubprocess import blocked
- processLine checks settled before firing callbacks
Pyodide worker improvements:
- Timeout (PYTHON_SANDBOX_TIMEOUT_MS, default 120s)
- worker.on('exit') rejects on non-zero exit code
- safeResolve/safeReject prevent double settlement
- disposeTables() called on all exit/error paths
Docker worker: promise-only errors, settled guards in all callback
paths, processLine helper, stdin error handling, done(final=true)
tracking, package load failure reporting, non-blocking cleanup.
Bug fixes: debug field (data.message->data.result), command injection,
disposeTables in error paths, __globals__ bypass, pint removed.
Signed-off-by: nikolay-zezin <Nikolay.Zezin@waveaccess.global>
…ox security Update supported libraries list: add scikit-learn, xarray, geopandas. Document removed libraries (duckdb, sqlalchemy, pint, bokeh, altair, cartopy, seaborn) with reasons. Add sandbox security section covering blocked operations and execution modes (Pyodide default, Docker experimental). Document built-in modules and transitive dependencies. Signed-off-by: nikolay-zezin <Nikolay.Zezin@waveaccess.global>
…dide worker Docker sandbox: replace Node.js + Pyodide (WASM) with native CPython 3.12. Same JSON stdin/stdout protocol — zero changes in host-side Docker worker. Benefits: <1s startup (was 30-60s), ~300MB memory (was 2-4GB), native speed, rasterio/rioxarray now available. Pyodide worker hardening: - Block socket networking functions (socket.socket, create_connection, getaddrinfo, gethostbyname, etc.) - Update import hook to PEP 451 API (find_spec) - Extract shared package list to python-packages.json Both paths: - Accumulate pendingDone promises via array + Promise.all (fixes race where multiple done() calls could lose in-flight work) - Smart JSON serializer for numpy/pandas/datetime types in CPython - Fix DockerCallbacks.onDone type to Promise<void> | void - Remove unused traceback import Signed-off-by: nikolay-zezin <Nikolay.Zezin@waveaccess.global>
…security Update python-implementation-in-guardian.md with: - Library versions for all installed packages - Docker-only libraries (rasterio, rioxarray) - Full Docker mode documentation (setup, benefits, security flags) - Execution modes comparison (Pyodide vs Docker) - Sandbox security details for both modes - Vulnerability comparison table - Configuration reference Signed-off-by: nikolay-zezin <Nikolay.Zezin@waveaccess.global>
Add .catch(safeReject) to pendingDones promises so that errors from done() (e.g. invalid output schema) are caught instead of becoming unhandled promise rejections that crash the process. In develop, these errors are caught by the worker message handler's try/catch → reject(). With pendingDones pattern, the promise could reject after the exit handler already resolved, causing a crash. Signed-off-by: nikolay-zezin <Nikolay.Zezin@waveaccess.global>
97577f5 to
2f60c6d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR hardens the Python custom logic block sandbox and extends supported Python libraries. It enforces input-data-only operations, prevents process initiation, restricts output to block channels, and adds an experimental Docker container isolation mode with native CPython.
networking functions in Pyodide worker
--security-opt=no-new-privileges
Related issue(s):
Fixes #5505
Fixes #5504
Notes for reviewer:
Two execution modes controlled by PYTHON_SANDBOX_MODE env var:
Docker mode requires building the sandbox image:
docker buildx build -t guardian/python-sandbox:latest policy-service/docker/python-sandbox
Known Pyodide-mode limitations (documented, mitigated by Docker mode):
application-level sandboxing