Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added
- nsjail-based sandboxing for code execution (replaces Docker socket-based approach)
- Single unified Docker image with all 12 language runtimes
- Single unified Docker image with all 13 language runtimes
- Hour and day periods for execution heatmap visualizations
- MyPy type checking integration with comprehensive type hints
- Dynamic Content Security Policy headers based on request path
Expand All @@ -33,7 +33,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added

#### Core Features
- Multi-language code execution supporting 12 languages: Python, JavaScript, TypeScript, Go, Java, C, C++, PHP, Rust, R, Fortran, and D
- Multi-language code execution supporting 13 languages: Python, JavaScript, TypeScript, Go, Java, C, C++, PHP, Rust, R, Fortran, D, and Bash
- FastAPI-based REST API with interactive documentation
- Sandboxed execution environments with comprehensive security controls
- Redis-based session management with automatic cleanup
Expand Down
3 changes: 2 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -267,8 +267,9 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
# REPL Server + entrypoint
# ============================================
COPY docker/repl_server.py /opt/repl_server.py
COPY docker/ptc_server.py /opt/ptc_server.py
COPY docker/entrypoint.sh /opt/entrypoint.sh
RUN chmod +x /opt/repl_server.py /opt/entrypoint.sh
RUN chmod +x /opt/repl_server.py /opt/ptc_server.py /opt/entrypoint.sh

# ============================================
# Sandbox directory structure
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Get up and running in minutes by building the execution environment.
docker build -t code-interpreter:nsjail .
```

This builds a single image containing all 12 language runtimes and nsjail for sandboxed execution.
This builds a single image containing all 13 language runtimes and nsjail for sandboxed execution.

4. **Start the API**

Expand All @@ -55,7 +55,7 @@ The dashboard requires the master API key for authentication.

## Features

- **Multi-language Support**: Execute code in 12 languages - Python, JavaScript, TypeScript, Go, Java, C, C++, PHP, Rust, R, Fortran, and D
- **Multi-language Support**: Execute code in 13 languages - Python, JavaScript, TypeScript, Go, Java, C, C++, PHP, Rust, R, Fortran, D, and Bash
- **Sub-50ms Python Execution**: Pre-warmed REPL sandboxes achieve ~20-40ms latency for simple Python code
- **Sandbox Pool**: Pre-warmed nsjail sandboxes provide ~3ms acquisition time (vs 500-2000ms cold start)
- **High Concurrency**: Thread-safe execution supporting 10+ concurrent requests
Expand Down Expand Up @@ -88,7 +88,7 @@ For a deep dive into the system design, components, and request flows, see [ARCH

The API provides endpoints for code execution, file management, and session state control.

- `POST /exec`: Execute code in one of the 12 supported languages.
- `POST /exec`: Execute code in one of the 13 supported languages.
- `POST /upload`: Upload files for processing.
- `GET /download`: Retrieve generated files.

Expand All @@ -98,7 +98,7 @@ For detailed information on all endpoints and specific language notes, see [ARCH

## Supported Languages

We support 12 programming languages including Python, JavaScript, TypeScript, Go, Rust, and more. Each language has optimized execution paths and resource limits.
We support 13 programming languages including Python, JavaScript, TypeScript, Go, Rust, Bash, and more. Each language has optimized execution paths and resource limits.

See the [Supported Languages table](docs/ARCHITECTURE.md#supported-languages) for details on versions and included libraries.

Expand Down
227 changes: 227 additions & 0 deletions docker/ptc_server.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,227 @@
#!/usr/bin/env python3
"""Programmatic Tool Calling (PTC) Server for nsjail sandbox execution.

This script runs INSIDE the nsjail sandbox and provides a Python execution
environment where code can call externally-defined tools. Tool calls are
serialized as JSON over stdin/stdout, allowing the host process to fulfill
them and send results back.

Protocol:
1. Host sends initial request via stdin:
{"code": "...", "tools": [{"name": "...", "description": "...", "parameters": {...}}]}

2. Code executes. When a tool stub is called, PTC server writes to stdout:
{"type": "tool_calls", "calls": [{"id": "...", "name": "...", "input": {...}}]}

3. Host reads tool_calls, fulfills them, and writes results to stdin:
{"type": "tool_results", "results": [{"call_id": "...", "result": ..., "is_error": false}]}

4. Code continues. On completion, PTC server writes:
{"type": "completed", "stdout": "...", "stderr": "..."}

5. On error, PTC server writes:
{"type": "error", "error": "..."}
"""

import asyncio
import json
import os
import sys
import traceback
import uuid
from io import StringIO

DELIMITER = "\n---PTC_END---\n"

# Keep references to the REAL stdin/stdout for protocol communication.
# User code's print() will be redirected to a StringIO capture buffer.
_real_stdin = sys.stdin
_real_stdout = sys.stdout
_real_stderr = sys.stderr


def _write_message(msg: dict) -> None:
"""Write a JSON message to the host via the real stdout."""
data = json.dumps(msg) + DELIMITER
_real_stdout.write(data)
_real_stdout.flush()


def _read_message() -> dict:
"""Read a JSON message from the host via the real stdin."""
buf = ""
while True:
line = _real_stdin.readline()
if not line:
raise EOFError("stdin closed")
buf += line
if DELIMITER in buf:
json_part = buf.split(DELIMITER)[0]
return json.loads(json_part)


# Pending tool calls collected during async execution
_pending_calls = []
_tool_results_map = {} # call_id -> result


def _make_tool_stub(tool_name: str) -> callable:
"""Create an async function stub for a tool."""

async def tool_stub(**kwargs):
call_id = uuid.uuid4().hex[:12]
call_info = {
"id": call_id,
"name": tool_name,
"input": kwargs,
}
_pending_calls.append(call_info)

# Wait for result - the main loop will flush calls and read results
while call_id not in _tool_results_map:
await asyncio.sleep(0.01)

result_info = _tool_results_map.pop(call_id)
if result_info.get("is_error"):
raise RuntimeError(
result_info.get("error_message", "Tool call failed")
)
return result_info.get("result")

tool_stub.__name__ = tool_name
tool_stub.__qualname__ = tool_name
return tool_stub


async def _execute_with_tools(
code: str, tools: list, user_stdout: StringIO, user_stderr: StringIO
) -> dict:
"""Execute code with tool stubs, capturing user output."""
global _pending_calls, _tool_results_map

_pending_calls = []
_tool_results_map = {}

# Build namespace with tool stubs
namespace = {"__builtins__": __builtins__, "__name__": "__main__"}

try:
import json as _json

namespace["json"] = _json
except ImportError:
pass

for tool in tools:
namespace[tool["name"]] = _make_tool_stub(tool["name"])

# Wrap user code in async function
indented_code = "\n".join(" " + line for line in code.split("\n"))
wrapped_code = f"async def __ptc_main__():\n{indented_code}\n"

try:
compiled = compile(wrapped_code, "<ptc_code>", "exec")
exec(compiled, namespace)
except SyntaxError as e:
return {"type": "error", "error": f"SyntaxError: {e}"}

main_func = namespace["__ptc_main__"]
main_task = asyncio.ensure_future(main_func())

try:
while not main_task.done():
# Let the task run briefly to accumulate batched calls
await asyncio.sleep(0.05)

if _pending_calls and not main_task.done():
calls_to_send = list(_pending_calls)
_pending_calls.clear()

_write_message({
"type": "tool_calls",
"calls": calls_to_send,
})

# Wait for results from host
response = _read_message()

if response.get("type") != "tool_results":
return {
"type": "error",
"error": f"Expected tool_results, got "
f"{response.get('type')}",
}

for result in response.get("results", []):
_tool_results_map[result["call_id"]] = result

# Task completed
main_task.result()
return {"type": "completed"}

except Exception as e:
tb = traceback.format_exc()
return {
"type": "error",
"error": str(e),
"stderr_extra": tb,
}


def main():
"""Main entry point for PTC server."""
try:
os.chdir("/mnt/data")
except OSError:
pass

# Read initial request
try:
request = _read_message()
except Exception as e:
_write_message({
"type": "error",
"error": f"Failed to read initial request: {e}",
})
return

code = request.get("code", "")
tools = request.get("tools", [])

if not code:
_write_message({"type": "error", "error": "No code provided"})
return

# Redirect sys.stdout and sys.stderr so user print() calls
# are captured, not mixed with our protocol messages.
user_stdout = StringIO()
user_stderr = StringIO()
sys.stdout = user_stdout
sys.stderr = user_stderr

try:
result = asyncio.run(
_execute_with_tools(code, tools, user_stdout, user_stderr)
)
except Exception as e:
result = {
"type": "error",
"error": str(e),
}

# Restore real stdout for final message
sys.stdout = _real_stdout
sys.stderr = _real_stderr

# Attach captured user output
result["stdout"] = user_stdout.getvalue()
stderr_val = user_stderr.getvalue()
if result.get("stderr_extra"):
stderr_val += result.pop("stderr_extra")
result["stderr"] = stderr_val

_write_message(result)


if __name__ == "__main__":
main()
22 changes: 16 additions & 6 deletions scripts/load_test/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,21 @@
}

# Supported languages
SUPPORTED_LANGUAGES = ["py", "js", "ts", "go", "java", "c", "cpp", "php", "rs", "r", "f90", "d"]
SUPPORTED_LANGUAGES = [
"py",
"js",
"ts",
"go",
"java",
"c",
"cpp",
"php",
"rs",
"r",
"f90",
"d",
"bash",
]


@dataclass
Expand Down Expand Up @@ -112,11 +126,7 @@ def get_api_key(self) -> str:
}


def get_vm_type(
cpu_cores: int,
memory_gb: int,
provider: str = "azure"
) -> str:
def get_vm_type(cpu_cores: int, memory_gb: int, provider: str = "azure") -> str:
"""Get recommended VM type for given resources."""
vm_maps = {
"azure": AZURE_VM_TYPES,
Expand Down
12 changes: 10 additions & 2 deletions scripts/load_test/scenarios/multi_language.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
"""Multi-language test scenarios for all 12 supported languages."""
"""Multi-language test scenarios for all 13 supported languages."""

from typing import List
from .base import BaseScenario


# Language-specific hello world and compute code
LANGUAGE_CODE = {
"py": {
Expand Down Expand Up @@ -150,6 +149,15 @@
writeln("D compute result: ", result);
}""",
},
"bash": {
"baseline": 'echo "Hello from Bash"',
"compute": """sum=0
for i in $(seq 0 9999); do
sum=$((sum + i * i))
done
echo "Bash compute result: $sum"
""",
},
}


Expand Down
Loading