feat: migrate to uv + add ContainerGrader for Kubernetes/Docker sandboxed grading by blarghmatey · Pull Request #14 · mitodl/xqueue-watcher

blarghmatey · 2026-03-11T15:49:20Z

Summary

This PR modernizes xqueue-watcher on two fronts:

Package management: migrate from pip-compile to uv
Security sandbox: replace AppArmor/codejail with container-based execution, enabling deployment on Kubernetes

Why this change?

The existing JailedGrader relies on codejail, which requires AppArmor kernel profiles and OS-level user-switching (setuid). This makes it incompatible with standard Kubernetes deployments. The existing base image (ubuntu:xenial) is also end-of-life.

See Automated code Graders With xqueue-watcher.md for prior art and context.

What changed

uv migration (`chore:` commit)

Replaced setup.py + all requirements/*.{in,txt} (14 files) with pyproject.toml + uv.lock
Added proper project metadata, entry point, optional [production] extra (newrelic), and dev dependency group
Rewrote Makefile with uv targets
Updated CI: astral-sh/setup-uv@v4, Python 3.11/3.12/3.13, ubuntu-24.04
Replaced OpenEdX upgrade workflow with uv lock --upgrade + create-pull-request

Compatibility fixes (`fix:` commit)

path v17 renamed abspath() → absolute() — fixed in grader.py and jailedgrader.py
Made codejail import optional (try/except) in jailedgrader.py and manager.py; raises RuntimeError pointing to ContainerGrader if not installed
Removed six (Python 2 shim) entirely
Updated Dockerfile: ubuntu:xenial → python:3.11-slim, removed apparmor packages
Fixed test_codejail_config: added fork_per_item=False for Python 3.14 forkserver compatibility

ContainerGrader (`feat:` commit)

New xqueue_watcher/containergrader.py — drop-in replacement for JailedGrader:

Same grade(grader_path, grader_config, submission) interface
kubernetes backend (production): creates a batch/v1 Job per submission; collects stdout from pod logs; deletes Job after completion
docker backend (local dev): runs container via Docker SDK; bind-mounts grader directory
Security: non-root, read-only root filesystem, resource limits, activeDeadlineSeconds, network disabled for grader containers

{
  "my-course": {
    "HANDLERS": [{
      "HANDLER": "xqueue_watcher.containergrader.ContainerGrader",
      "KWARGS": {
        "grader_root": "/graders/my-course/",
        "image": "registry.example.com/my-course:latest",
        "backend": "kubernetes",
        "cpu_limit": "500m",
        "memory_limit": "256Mi",
        "timeout": 20
      }
    }]
  }
}

Grader base image (`feat:` commit)

grader_support/Dockerfile.base: base image course teams extend
grader_support/entrypoint.py: reads SUBMISSION_CODE env var → /tmp/submission.py → invokes grader_support.run
grader_support/README.md: course team authoring guide

Kubernetes manifests (`feat:` commit)

New deploy/kubernetes/ directory (Kustomize-compatible):

serviceaccount.yaml, rbac.yaml — create/delete Jobs, read pod logs
deployment.yaml — 2 replicas, securityContext, resource limits, readinessProbe
networkpolicy.yaml — deny all ingress/egress on grader Job pods; allow watcher→xqueue
configmap.yaml, secret.yaml.template, kustomization.yaml
docker-compose.yml for local full-stack development

What is NOT changing

xqueue polling model (client.py, manager.py)
Grader test framework (grader_support/gradelib.py, run.py)
JSON result format
conf.d configuration structure
Course team grader Python code

Security model comparison

Property	AppArmor/codejail	ContainerGrader
Process isolation	AppArmor MAC profile	Linux namespaces + cgroups
User	setuid switch	Non-root UID 1000 in container
Resource limits	ulimits	k8s/Docker resource limits
Network	OS firewall rules	NetworkPolicy / `--network none`
Timeout	SIGKILL via codejail	`activeDeadlineSeconds` / `timeout`
Filesystem	chroot-like	Read-only root, /tmp volume
Privilege required	AppArmor + setuid	None (standard container runtime)

Testing

All 25 existing tests pass (uv run pytest). ContainerGrader tests require a live Docker/k8s environment and are integration tests; unit tests for the class itself can be added in a follow-up PR.

gemini-code-assist · 2026-03-11T15:50:13Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly modernizes the xqueue-watcher system by overhauling its package management and introducing a robust, container-based security sandbox. The primary goal is to enhance compatibility with modern cloud-native environments like Kubernetes, moving away from legacy AppArmor-based isolation. These changes aim to improve the developer experience for course teams by enabling local testing with Docker and reducing the operational burden on infrastructure engineers, while maintaining a secure and scalable grading system.

Highlights

Package Management Migration to uv: The project's package management has been modernized by migrating from pip-compile to uv. This involved replacing setup.py and all requirements/*.{in,txt} files with a pyproject.toml and uv.lock, updating the Makefile, and adjusting the CI pipeline to use astral-sh/setup-uv@v4.
Container-Based Grading with ContainerGrader: A new ContainerGrader has been introduced as a secure, container-based alternative to the JailedGrader. This new grader supports both Kubernetes (production) and Docker (local development) backends, enabling grading in isolated environments without relying on AppArmor or OS-level user-switching.
Kubernetes Deployment Support: Comprehensive Kubernetes manifests have been added, including ConfigMap, Deployment, NetworkPolicy, RBAC, Secret template, and ServiceAccount definitions. These resources facilitate deploying xqueue-watcher on Kubernetes, with specific configurations for grader job isolation and security.
Python Version and Dependency Updates: The base Docker image has been updated from ubuntu:xenial to python:3.11-slim, removing deprecated AppArmor packages. Compatibility fixes were applied for path library changes (abspath() to absolute()), codejail imports were made optional, and the six Python 2 shim was entirely removed.

Changelog

Compatibility fixes
- Updated path library method abspath() to absolute() in xqueue_watcher/grader.py and xqueue_watcher/jailedgrader.py.
- Made codejail imports optional in xqueue_watcher/jailedgrader.py and xqueue_watcher/manager.py, raising a RuntimeError if not installed.
- Removed the six Python 2 shim entirely from xqueue_watcher/jailedgrader.py.
- Modified tests/test_manager.py to include fork_per_item=False for JailedGrader to ensure compatibility with Python 3.14 forkserver.
ContainerGrader implementation
- Added xqueue_watcher/containergrader.py which provides a new ContainerGrader class supporting Kubernetes and Docker backends for isolated code execution.
- Updated conf.d/600.json to use xqueue_watcher.containergrader.ContainerGrader with Docker backend and resource limits.
Grader base image and support
- Added grader_support/Dockerfile.base to define a base Docker image for course-specific graders, including a non-root user and the grader_support framework.
- Introduced grader_support/entrypoint.py as the container entrypoint to handle submission code from environment variables.
- Provided grader_support/README.md with instructions for course teams on authoring and using custom grader images.
Kubernetes deployment manifests
- Added deploy/kubernetes/configmap.yaml for xqueue-watcher configuration and example queue settings.
- Included deploy/kubernetes/deployment.yaml for deploying xqueue-watcher with specified replicas, resource limits, and security contexts.
- Created deploy/kubernetes/kustomization.yaml for Kustomize-based Kubernetes deployments.
- Added deploy/kubernetes/networkpolicy.yaml to enforce network isolation for grader pods.
- Defined deploy/kubernetes/rbac.yaml for Kubernetes Role-Based Access Control to manage grader Jobs and logs.
- Provided deploy/kubernetes/secret.yaml.template as a template for Kubernetes secrets.
- Added deploy/kubernetes/serviceaccount.yaml for the xqueue-watcher service account.
- Introduced docker-compose.yml for local full-stack development, integrating xqueue, xqueue-watcher, and a sample grader.
uv migration
- Replaced setup.py and all requirements/*.{in,txt} files with pyproject.toml and uv.lock.
- Updated Makefile with uv targets for dependency management, testing, and Docker builds.
- Adjusted Dockerfile to use python:3.11-slim and removed AppArmor-related packages, aligning with uv and containerization.

Ignored Files

Ignored by pattern: .github/workflows/** (2)
- .github/workflows/ci.yml
- .github/workflows/upgrade-python-requirements.yml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

xqueue_watcher/containergrader.py

gemini-code-assist

Code Review

This pull request modernizes the xqueue-watcher project by migrating to uv for package management and introducing a ContainerGrader for Kubernetes/Docker sandboxed grading, enhancing security and deployment flexibility. However, a high-severity path traversal vulnerability was identified in the grader path construction logic, which could allow arbitrary code execution or sensitive information disclosure. Strict validation on the grader path is recommended. Additionally, inconsistencies and areas for improvement were noted in the Dockerfile, documentation, and error handling that should be addressed.

Dockerfile

xqueue_watcher/grader.py

Automated code Graders With xqueue-watcher.md

deploy/kubernetes/deployment.yaml

docker-compose.yml

xqueue_watcher/containergrader.py

- Run migrate-to-uv to bootstrap pyproject.toml from requirements/base.txt and requirements/test.txt - Add full project metadata: name, version, description, requires-python>=3.11, license, hatchling build backend, entry point xqueue-watcher -> manager:main - Add newrelic as [project.optional-dependencies.production] - Add dev dependency group: coverage, mock, pytest-cov - Remove setup.py (replaced by pyproject.toml) - Remove all requirements/*.in and requirements/*.txt files (14 files) - Generate uv.lock with pinned dependency graph - Update Makefile: replace pip/pip-compile targets with uv sync / uv run pytest - Update .github/workflows/ci.yml: use astral-sh/setup-uv@v4, drop ubuntu-20.04 and Python 3.8, add Python 3.13, update to actions/checkout@v4 - Replace upgrade-python-requirements workflow with uv lock --upgrade + create-pull-request workflow Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- Remove six (Python 2 compat shim) from imports and SUPPORT_FILES in jailedgrader.py — Python 3 only going forward - Wrap codejail imports in try/except in jailedgrader.py and manager.py; raise RuntimeError with clear message directing users to ContainerGrader - Fix Path.abspath() -> Path.absolute() (breaking API change in path v17) in grader.py and jailedgrader.py - Update Dockerfile: ubuntu:xenial -> python:3.11-slim, remove apparmor and language-pack-en packages, fix layer ordering - Update test_codejail_config to use fork_per_item=False to avoid multiprocessing state-inheritance failure on Python 3.14 forkserver default - Update conf.d/600.json example to use ContainerGrader handler Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Adds xqueue_watcher/containergrader.py — a drop-in replacement for JailedGrader that executes student code inside an isolated container instead of using AppArmor/codejail. Security model (replaces AppArmor): - Container isolation (Linux namespaces + cgroups) - Non-root user (UID 1000), read-only root filesystem - CPU/memory resource limits enforced by container runtime - Network disabled for grader containers (no egress) - Hard wall-clock timeout via activeDeadlineSeconds (k8s) or timeout (Docker) Two pluggable backends selected via the 'backend' KWARGS option: kubernetes (default / production): - Creates a batch/v1 Job per submission using the kubernetes Python client - Auto-detects in-cluster vs kubeconfig credentials - Polls until Job completes, collects stdout from pod logs - Deletes the Job after result collection (ttlSecondsAfterFinished=300) - Job pod spec includes: securityContext, resource limits, activeDeadlineSeconds, and labels for observability docker (local dev / CI): - Runs a container via the docker Python SDK - Bind-mounts the grader directory read-only - Passes SUBMISSION_CODE as an environment variable - Network disabled, memory + CPU limits applied Student code is passed via SUBMISSION_CODE env var (avoids argv length limits and shell injection). The entrypoint writes it to /tmp before invoking grader_support.run, producing the same JSON output format that JailedGrader already expects — so no changes to grader test framework or course team grader code are required. Configuration example (conf.d/my-course.json): { "my-course": { "HANDLERS": [{ "HANDLER": "xqueue_watcher.containergrader.ContainerGrader", "KWARGS": { "grader_root": "/graders/my-course/", "image": "registry.example.com/my-course:latest", "backend": "kubernetes", "cpu_limit": "500m", "memory_limit": "256Mi", "timeout": 20 } }] } } Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

grader_support/Dockerfile.base: - python:3.11-slim base, non-root grader user (UID 1000) - Copies grader_support framework; installs path-py - ENTRYPOINT: python -m grader_support.entrypoint - /tmp volume for submission files (writable even with read-only root fs) - Course teams extend this image to add their deps and grader scripts grader_support/entrypoint.py: - Reads SUBMISSION_CODE env var, writes to /tmp/submission.py - Adds /tmp and cwd to sys.path, then delegates to grader_support.run - Prints JSON result to stdout (same schema JailedGrader already parses) grader_support/README.md: - Course team authoring guide: how to extend the base image, configure the handler, and understand the security properties Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

deploy/kubernetes/ (Kustomize-compatible): - serviceaccount.yaml — dedicated SA for xqueue-watcher pods - rbac.yaml — Role + RoleBinding: create/delete Jobs, read pod logs - configmap.yaml — watcher xqwatcher.json config (edit for your queues) - deployment.yaml — 2 replicas, topologySpreadConstraints, securityContext, resource limits, readinessProbe - networkpolicy.yaml — deny all ingress/egress on grader Job pods (label: role=grader-job); allow xqueue-watcher egress to xqueue - secret.yaml.template — placeholder: copy to secret.yaml, fill in credentials, do not commit secret.yaml (added to .gitignore) - kustomization.yaml — Kustomize entry point for the base directory docker-compose.yml (local dev): - xqueue-watcher container with docker socket access (for docker backend) - Mounts conf.d/ and grader directories - Includes a sample xqueue service reference for full local stack Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ContainerGrader had two bugs affecting how grader files were located inside the container at runtime: 1. Docker backend bind-mounted the grader problem directory at /grader, overwriting the grader_support package that the base image copies there. Fixed by binding at /graders instead and passing the resulting absolute in-container path (/graders/<file>) to the entrypoint. 2. Kubernetes backend set working_dir to the grader problem directory (e.g. /graders/ps07/Robot/), preventing Python from finding the grader_support package which lives at /grader/grader_support/. Fixed by keeping working_dir=/grader (the base image WORKDIR) and passing the absolute grader path in args instead of just the basename. entrypoint.py previously passed the full absolute path verbatim to __import__(), which fails for paths containing slashes. It now detects absolute paths, inserts the parent directory into sys.path, and uses only the basename as the importable module name. Also updates grader_support/README.md to document the correct layout (/graders/ for course grader scripts, /grader/ for grader_support) and the gradelib compatibility note for course teams migrating from Python 2 graders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xqueue_watcher/containergrader.py

codejail is an optional dependency (not installed in CI). Guard the import with a try/except and apply @pytest.mark.skipif to the test class so collection succeeds and tests are skipped gracefully when codejail is absent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

.github/workflows/upgrade-python-requirements.yml

xqueue_watcher/containergrader.py

- Dockerfile: replace deleted requirements/ pip install with uv sync (copies uv binary from ghcr.io/astral-sh/uv and uses uv sync --frozen) - grader.py: guard against path traversal in grader_config['grader']; validate that the resolved grader path stays within grader_root - containergrader.py: fix Docker SDK TypeError - containers.run() does not accept a timeout kwarg; switch to detach=True + container.wait() to enforce the timeout, then collect logs and remove the container - containergrader.py: remove brittle hardcoded line numbers (L364, L379, L397, L450) from user-facing error messages - docker-compose.yml: change conf.d and data volumes from :ro to :rw so local edits take effect without rebuild (matches comment intent) - upgrade-python-requirements.yml: add explicit permissions block (contents: write, pull-requests: write) as required by security policy - Automated code Graders With xqueue-watcher.md: remove empty heading, add 'Property' header to comparison table Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

This PR modernizes xqueue-watcher’s grading execution and packaging by introducing a container-based grader (Docker/Kubernetes) as the preferred sandboxing mechanism, while making the legacy AppArmor/codejail path optional. It also migrates the project from pip-compile/requirements + setup.py to a pyproject.toml + uv.lock workflow and adds deployment scaffolding for local Docker Compose and Kubernetes.

Changes:

Add ContainerGrader (Docker + Kubernetes backends) plus container entrypoint/base image under grader_support/.
Make codejail optional and improve error messaging when codejail is unavailable.
Replace requirements/setup.py tooling with Hatch/pyproject.toml, uv.lock, updated CI workflow(s), plus new local/K8s deployment manifests.

Reviewed changes

Copilot reviewed 39 out of 41 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
xqueue_watcher/containergrader.py	New container-based grading implementation (Docker + Kubernetes backends).
grader_support/entrypoint.py	Container entrypoint that writes submission code to `/tmp` and runs the framework runner.
grader_support/Dockerfile.base	Base grader image to run student code non-root with the entrypoint.
grader_support/README.md	Documentation for building/using the grader base image and handler configuration.
xqueue_watcher/jailedgrader.py	Treat codejail as optional; fail fast with a clearer error when absent; remove legacy six support-file wiring.
xqueue_watcher/manager.py	Make codejail import optional and raise an explicit error if configured without dependency installed.
xqueue_watcher/grader.py	Switch to `Path.absolute()` for grader path resolution.
tests/test_manager.py	Adjust codejail test to avoid relying on subprocess/fork inheritance behavior.
pyproject.toml	New project metadata and dependency definition for uv/hatch-based packaging.
uv.lock	New uv lockfile for dependency pinning.
Makefile	Reworked targets to use uv + docker-compose workflows.
Dockerfile	Updated runtime container build, user setup, and New Relic stage.
docker-compose.yml	New local dev stack including xqueue and watcher service.
deploy/kubernetes/*	New kustomize-ready manifests for service account, RBAC, deployment, configmap, and network policy.
conf.d/600.json	Update sample handler config to use `ContainerGrader` with docker backend.
.github/workflows/ci.yml	Switch CI to uv and expand Python test matrix.
.github/workflows/upgrade-python-requirements.yml	Replace requirements-bot workflow with `uv lock --upgrade` PR automation.
.gitignore	Ignore uv venv and local Kubernetes secret manifest.
setup.py	Removed legacy setuptools packaging entrypoint.
requirements/*	Removed pip-compile based requirements/constraints files (directory deleted).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Dockerfile

Makefile

xqueue_watcher/grader.py

deploy/kubernetes/deployment.yaml

docker-compose.yml

xqueue_watcher/containergrader.py

pyproject.toml

path-py is an external dependency that wraps pathlib with a fluent API. Since we now require Python >= 3.11, pathlib covers all the same functionality without an extra dependency. Changes: - Replace 'from path import Path' with 'from pathlib import Path' in all source and test files - .dirname() → .parent - .basename() → .name - .absolute() / .absolute() → .resolve() (symlink-safe) - .files('*.json') → .glob('*.json') (with sorted() for stable ordering) - Remove path-py (path-py / path) from pyproject.toml dependencies - Regenerate uv.lock (removes path==17.1.1 and path-py==12.5.0) - Simplify grader.py path-traversal check: now that grader_path is a native pathlib.Path, the inline 'import pathlib' is no longer needed - Fix test_grader.py mock: grader_path.endswith() → grader_path.name == - Fix test_manager.py: pass str() to argparse (Path is not subscriptable) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…lation decision Add edx-codejail (the upstream PyPI package, v4.1.0) as an optional 'codejail' extra, replacing the previously pinned git-URL reference to a specific commit. uv add --optional codejail edx-codejail codejail is intentionally excluded from the base Docker image because ContainerGrader uses container-level isolation (Linux namespaces, cgroups, capability dropping, network isolation, read-only filesystem) which provides equivalent sandboxing to AppArmor without requiring host-level AppArmor configuration that is unavailable inside Kubernetes pods. Install the 'codejail' extra only when using the legacy JailedGrader on a bare-metal or VM host with AppArmor configured. To use: uv sync --extra codejail Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 41 out of 43 changed files in this pull request and generated 10 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

xqueue_watcher/containergrader.py

grader_support/entrypoint.py

conf.d/600.json

xqueue_watcher/containergrader.py

deploy/kubernetes/deployment.yaml

xqueue_watcher/containergrader.py

- Makefile: fix tab indentation on all recipe lines (was space-indented) - grader.py: remove unused sys import - jailedgrader.py: replace deprecated load_module() with spec_from_file_location/exec_module - containergrader.py: - remove unused imports (logging, os, tempfile) and _JOB_LABEL constant - add emptyDir volume at /tmp in K8s Job spec (required when read_only_root_filesystem=True) - add clarifying comment that K8s grader scripts are baked into the course image - replace deprecated load_module() with importlib.util spec/exec_module pattern - capture stderr from Docker container on non-zero exit for better diagnostics - grader_support/entrypoint.py: correct misleading comment about /tmp writability - deploy/kubernetes/deployment.yaml: fix command to use xqueue-watcher entry point - deploy/kubernetes/configmap.yaml: add xqueue-watcher-queue-configs ConfigMap so manifests apply cleanly out of the box - docker-compose.yml: mount Docker socket for docker backend to work - conf.d/600.json: use absolute /graders/ path instead of relative ../data path - Dockerfile: use C.UTF-8 locale (available without installing locales package) - pyproject.toml: add edx-codejail to dev group so jailed grader tests run in CI Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

blarghmatey · 2026-03-11T17:31:52Z

/gemini review

gemini-code-assist · 2026-03-11T17:36:15Z

Warning

Gemini is experiencing higher than usual traffic and was unable to create the review. Please try again in a few hours by commenting /gemini review.

…der unit tests Architecture change: grader scripts are baked into the course-specific Docker image, so the watcher pod has no need to access grader files locally. The grader_support entrypoint now runs the complete grading pipeline inside the container (load grader, preprocess, run answer + submission, compare, return JSON grade), and ContainerGrader.grade() is simplified to just launch the container and parse its JSON output. Changes: - grader_support/entrypoint.py: complete rewrite; now takes GRADER_FILE SEED (not GRADER_FILE submission.py SEED); runs full grade pipeline in container; reads GRADER_LANGUAGE and HIDE_OUTPUT env vars from ContainerGrader - xqueue_watcher/containergrader.py: - Remove grader-module loading, gettext, answer.py reading, and all test- comparison logic from grade() — the container handles this now - grade() now just calls _run() and parses the returned JSON - _run() accepts grader_config and forwards lang/hide_output as env vars - _build_k8s_job(): args are now [grader_abs, seed] (not 3 args), adds GRADER_LANGUAGE and HIDE_OUTPUT env vars, still mounts emptyDir at /tmp - _run_docker(): same arg change; passes GRADER_LANGUAGE and HIDE_OUTPUT - ReadTimeout from container.wait() caught and re-raised as clear RuntimeError - Remove unused _truncate, _prepend_coding, importlib.util - tests/test_container_grader.py: 36 new unit tests covering: - _parse_cpu_millis - ContainerGrader init / backend validation - _build_k8s_job: args, env vars, resource limits, emptyDir/tmp, security - _run_docker: success, non-zero exit (with stderr), timeout, missing SDK - grade(): skip_grader, successful result, container failure, size warning Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- docker-compose.yml: remove unused GRADER_BACKEND env var, fix duplicate volumes key by merging into one list, tag sample-grader with image: grader-base:local so conf.d/600.json reference resolves - Dockerfile: standardise CMD config path to /etc/xqueue-watcher to match docker-compose and Kubernetes manifests - metrics.py: remove OTEL_METRIC_EXPORT_INTERVAL from docstring since it is not wired up in _build_meter_provider() - containergrader.py: add pod template metadata labels so the NetworkPolicy podSelector (app.kubernetes.io/component=xqueue-grader) actually matches grading pods; set automount_service_account_token=False on the grading pod spec to reduce blast radius if the NetworkPolicy is misconfigured; add _parse_memory_bytes() helper and use it for the Docker backend mem_limit so Kubernetes-style strings like '256Mi' are converted to bytes rather than passed raw (which Docker does not accept) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 45 out of 47 changed files in this pull request and generated 9 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Dockerfile

xqueue_watcher/containergrader.py

docker-compose.yml

grader_support/Dockerfile.base

xqueue_watcher/grader.py

tests/test_env_settings.py

+    def test_defaults_when_no_env_vars_set(self):
+        with patch.dict("os.environ", {}, clear=False):
+            config = get_manager_config_from_env()
+        self.assertEqual(config, MANAGER_CONFIG_DEFAULTS)


tests/test_env_settings.py

+    def test_follow_client_redirects_default_is_false(self):
+        with patch.dict("os.environ", {}, clear=False):
+            config = get_manager_config_from_env()
+        self.assertFalse(config["FOLLOW_CLIENT_REDIRECTS"])


tests/test_metrics.py

+class TestBuildMeterProvider(unittest.TestCase):
+    def test_returns_meter_provider(self):
+        with patch.dict("os.environ", {}, clear=False):
+            provider = _build_meter_provider()
+        self.assertIsInstance(provider, MeterProvider)


uv installs the console script into the project virtual environment at .venv/bin/xqueue-watcher. Without adding this directory to PATH the CMD cannot be found at container startup. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xqueue_watcher/containergrader.py

When no logging.json file is present, manager.py now calls configure_logging() from env_settings instead of basicConfig(). configure_logging() sets up a single StreamHandler on stdout with a consistent timestamp/level/module format, honours XQWATCHER_LOG_LEVEL (default INFO), and suppresses noisy requests/urllib3 debug output. This removes the need for a logging.json file in Kubernetes deployments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Using PATH via ENV is fragile -- container runtimes and security policies can reset or ignore it. Install a symlink at /usr/local/bin/xqueue-watcher (always in the standard system PATH) so the entrypoint resolves regardless of how the container is launched. Also remove the stale NEW_RELIC_LICENSE_KEY env entry from the Kubernetes deployment manifest. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

sentry · 2026-03-18T21:15:48Z

grader_support/entrypoint.py

+            if e is not None:
+                act_out += f"\n*** ERROR: {e} ***"


Bug: The condition if e is not None: inside an except EndTest as e: block is always true, causing an empty error message to be shown to students.
_{Severity: MEDIUM}

Suggested Fix

Replace the check if e is not None: with a check on the exception's message content, such as if str(e): or if e.args:. This will correctly verify whether the EndTest exception was raised with a message before appending anything to the output.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: grader_support/entrypoint.py#L142-L143 Potential issue: In an `except EndTest as e:` block, the exception object `e` can never be `None`. However, the code checks `if e is not None:`, which is always true. This causes an error message string, `"\n*** ERROR: {e} ***"`, to be appended to the output even when the `EndTest` exception is raised without a message. As a result, when a test ends via `EndTest()`, students will see a confusing, empty error message like `"*** ERROR: ***"` in their output.

Add get_container_grader_defaults() to env_settings, reading five new XQWATCHER_GRADER_* env vars: XQWATCHER_GRADER_BACKEND (default: kubernetes) XQWATCHER_GRADER_NAMESPACE (default: default) XQWATCHER_GRADER_CPU_LIMIT (default: 500m) XQWATCHER_GRADER_MEMORY_LIMIT (default: 256Mi) XQWATCHER_GRADER_TIMEOUT (default: 20) ContainerGrader.__init__ now uses None sentinels for these params so that any value omitted from a conf.d KWARGS block falls back to the env-derived default rather than a hardcoded constant. Values supplied explicitly in conf.d always take precedence, preserving backwards compatibility. Also fixes duplicate function definitions that had crept into env_settings.py. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…pport - Add ImageDigestPoller class: background daemon thread that periodically resolves a tag-based image reference to its current digest via docker.APIClient.inspect_distribution(). Thread-safe; falls back to the original reference if resolution fails. - Add image_pull_policy param to ContainerGrader (auto-detect: IfNotPresent for digest refs, Always for tag-based refs; can be overridden explicitly). - Add poll_image_digest and digest_poll_interval params to activate the poller. When enabled, Kubernetes Jobs use the most recently resolved repo@sha256:… reference via _effective_image(), ensuring nodes always run the latest pushed image without relying on imagePullPolicy: Always for every pod. - Add .github/workflows/publish-grader-base-image.yml to build and push grader_support/Dockerfile.base to ghcr.io/mitodl/xqueue-watcher-grader-base on push to master (grader_support/** paths), weekly schedule, and workflow_dispatch. Multi-platform linux/amd64,linux/arm64. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Kubernetes requires imagePullPolicy to be exactly 'Always', 'IfNotPresent', or 'Never' (case-sensitive). When the value is supplied via KWARGS in the conf.d JSON (e.g. 'always' or 'ALWAYS'), the K8s API returns 422 Unprocessable Entity. Add a normalisation dict lookup that maps the lowercased input back to the canonical title-case form. Unknown values are passed through unchanged so Kubernetes can surface the validation error with a clear message. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…efixes LMS grader_payload 'grader' fields configured against the old git-clone deployment include a queue-name prefix, e.g.: mit-600x-Watcher-MITX-6.0001r/graders/python3graders/chips1/.../grade.py In the containerized approach, graders are baked directly into the image at grader_root, so the path resolves to: /graders/mit-600x-Watcher-MITX-6.0001r/graders/python3graders/... which doesn't exist. The actual path in the image is: /graders/python3graders/... Add strip_path_components (int, default 0) KWARG to ContainerGrader. When > 0, that many leading path components are stripped from the grader path (relative to grader_root) before it is passed as the container entrypoint argument. Set to 2 to remove both the queue-name component and the redundant repo subdirectory name. Example KWARGS: "strip_path_components": 2 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xqueue_watcher/containergrader.py

Grader scripts may call _() at module level (e.g. in input_validators defined at import time). The previous code installed trans.install() after exec_module, causing NameError: name '_' is not defined. Move the entire locale/gettext setup block to before exec_module so _ is available in builtins when the grader script is first executed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Python 3 raises TabError when exec'ing code with mixed tabs and spaces in the same indented block. Many course grader answer.py files were authored for Python 2 which tolerated this. Call expandtabs(4) on both the staff answer and student submission before preprocessing and writing to /tmp, so exec never sees raw tabs. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…nals run.py's import_captured() uses __import__() to load answer and submission modules. grader_dir was inserted into sys.path AFTER /tmp, making it position 0, so __import__('answer') found the original /graders/.../answer.py (with bare 'for c in s:') instead of the preprocessed /tmp/answer.py (with 'submission_code = repr(...)'). Fix: insert grader_dir first, then /tmp, so /tmp is position 0 and the preprocessed files always shadow the originals. Also: - Add _dbg() helper for debug tracing behind GRADER_DEBUG=1 env var; off by default so stderr output doesn't corrupt the JSON pod log that containergrader.py reads via read_namespaced_pod_log. - Import traceback (used by _dbg exception paths). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add an explicit ERROR-level log of the raw bytes (repr, up to 4096) when json.loads fails so we can see exactly what the pod log contains, including any leading/trailing garbage from stderr that Kubernetes combines into the pod log stream. Also add a DEBUG-level log of every container output for tracing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Concourse grader-image pipelines use DockerHub as the trigger source. The workflow previously only pushed to GHCR, so Concourse never saw updates to the base image. Changes: - Add DockerHub login step (DOCKERHUB_USERNAME/DOCKERHUB_PASSWORD secrets) - Push to both mitodl/xqueue-watcher-grader-base (DockerHub) and ghcr.io/mitodl/xqueue-watcher-grader-base (GHCR) - Tag :latest on feature branches during active development so Concourse picks up fixes without waiting for master merge - Add feature branches to push trigger so grader_support fixes are published immediately Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The default stream parameter is 'All', which interleaves stderr into the returned string. Any stderr output from the container (Python warnings, import messages, etc.) corrupts the JSON that the entrypoint prints to stdout, causing JSONDecodeError in the watcher. Pass stream='Stdout' and container='grader' explicitly so only the container's stdout is returned. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The PodLogsQuery feature gate (which enables the 'stream' field in PodLogOptions) is opt-in and disabled on the target cluster. Using stream= returns a 422 FieldValueForbidden error even on K8s 1.35. Instead, fetch the combined stdout+stderr log and scan backwards for the last non-empty line. The entrypoint always prints exactly one JSON object as its final output line, so this reliably extracts the result regardless of any stderr noise interleaved earlier in the log. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

read_namespaced_pod_log returns response_type='str'. The client's deserialize() method first calls json.loads() on the raw response body (succeeds since the entrypoint outputs valid JSON), then passes the resulting Python dict to __deserialize_primitive(dict, str) which calls str(dict) — producing Python repr with single-quoted keys and True/False booleans, which is not valid JSON. Fix: pass _preload_content=False to get the raw urllib3 response object and read .data directly as bytes, bypassing the client deserialisation entirely. The raw bytes are valid UTF-8 JSON as printed by the entrypoint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… scope Addresses GitHub Advanced Security finding: 'Workflow does not contain permissions'. Adding a workflow-level permissions: {} block ensures the GITHUB_TOKEN has no default permissions; each job must explicitly declare what it needs. The update-dependencies job retains its required contents: write and pull-requests: write grants. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

strip_path_components was added to work around what turned out to be a configuration error in the LMS grader_payload, not a structural problem in the grading path resolution. Remove the parameter, its __init__ assignment, the stripping logic in _build_k8s_job, and all docstring references to keep the code simple and correct. Also addressed in this commit: - grader.py: downgrade per-submission grading-time log from INFO to DEBUG to avoid high-volume noise in production log streams - Dockerfile: pin uv to 0.10.7 via a named build stage instead of floating ghcr.io/astral-sh/uv:latest; replace the xqueue-watcher symlink with ENV PATH so the full venv is on PATH - env_settings.py: add XQWATCHER_DOCKER_HOST_GRADER_ROOT env var (preparation for docker_host_grader_root ContainerGrader param) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

ContainerGrader (Docker backend): add docker_host_grader_root parameter so that when xqueue-watcher runs inside a container the bind-mount source path can be translated from the watcher-container path to the equivalent host-side path. Without this the Docker daemon (reached via the mounted socket) would look for the grader directory on the host where it does not exist. Defaults to XQWATCHER_DOCKER_HOST_GRADER_ROOT env var or None (watcher runs directly on the host, no translation needed). docker-compose.yml: add XQWATCHER_DOCKER_HOST_GRADER_ROOT placeholder and explanatory comment so operators know to set the absolute host path. grader_support/Dockerfile.base: remove the path-py pip install. The grader_support framework itself does not import path; course teams that need path-py can add it in their own downstream image. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Queue configs in conf.d can now use SERVER_REF to reference a named server defined in xqueue_servers.json, avoiding the need to embed XQueue URLs and credentials directly in grader configuration files. - settings.py: add get_xqueue_servers() to load and validate xqueue_servers.json from the config root - manager.py: load xqueue_servers.json in configure_from_directory(); resolve SERVER_REF in client_from_config(), raising ValueError for unknown names or conflicts with SERVER/AUTH - env_settings.py: document the Kubernetes Secret volume-mount pattern for xqueue_servers.json as the preferred credentials delivery mechanism - conf.d/600.json: update example to use SERVER_REF - tests: add ServerRefTests and TestGetXqueueServers covering resolution, error cases, and configure_from_directory integration - tests/fixtures/config/xqueue_servers.json: fixture server for tests - README.md: document SERVER_REF, xqueue_servers.json format, and Kubernetes Secret mounting pattern Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Only push to GHCR. Remove the DockerHub login step, DockerHub image reference from the metadata action, and the DOCKERHUB_USERNAME / DOCKERHUB_PASSWORD secret dependencies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Add the edxorg-686x queue to the mitxonline production xqwatcher stack using the ContainerGrader handler, replacing the legacy JailedGrader configuration in confd_json. This is in preparation for deployment of the xqueue-watcher changes in mitodl/xqueue-watcher#14. The memory limit is set to 1Gi (vs 512Mi for 600x) to accommodate the torch dependency used by the mnist problem set graders. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

sentry bot reviewed Mar 11, 2026

View reviewed changes

xqueue_watcher/containergrader.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

blarghmatey mentioned this pull request Mar 11, 2026

feat(xqwatcher): migrate from EC2 ASG to Kubernetes Deployment mitodl/ol-infrastructure#4287

Open

blarghmatey and others added 6 commits March 11, 2026 12:30

blarghmatey force-pushed the chore/migrate-to-uv-and-k8s-container-grader branch from d59ff3b to 18c7151 Compare March 11, 2026 16:31

sentry bot reviewed Mar 11, 2026

View reviewed changes

xqueue_watcher/containergrader.py Show resolved Hide resolved

blarghmatey requested a review from Copilot March 11, 2026 16:45

Copilot started reviewing on behalf of blarghmatey March 11, 2026 16:45 View session

github-advanced-security bot found potential problems Mar 11, 2026

View reviewed changes

.github/workflows/upgrade-python-requirements.yml Fixed Show fixed Hide fixed

sentry bot reviewed Mar 11, 2026

View reviewed changes

xqueue_watcher/containergrader.py Outdated Show resolved Hide resolved

Copilot AI reviewed Mar 11, 2026

View reviewed changes

blarghmatey requested a review from Copilot March 11, 2026 17:03

Copilot started reviewing on behalf of blarghmatey March 11, 2026 17:04 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

sentry bot reviewed Mar 11, 2026

View reviewed changes

xqueue_watcher/containergrader.py Show resolved Hide resolved

blarghmatey requested a review from Copilot March 11, 2026 20:32

blarghmatey force-pushed the chore/migrate-to-uv-and-k8s-container-grader branch from 32d6ffc to 16e034b Compare March 18, 2026 16:41

blarghmatey requested a review from Copilot March 18, 2026 16:43

Copilot started reviewing on behalf of blarghmatey March 18, 2026 16:44 View session

Copilot AI reviewed Mar 18, 2026

View reviewed changes

sentry bot reviewed Mar 18, 2026

View reviewed changes

xqueue_watcher/containergrader.py Show resolved Hide resolved

blarghmatey and others added 2 commits March 18, 2026 16:30

sentry bot reviewed Mar 18, 2026

View reviewed changes

blarghmatey and others added 4 commits March 19, 2026 11:29

sentry bot reviewed Mar 19, 2026

View reviewed changes

xqueue_watcher/containergrader.py Outdated Show resolved Hide resolved

blarghmatey and others added 12 commits March 19, 2026 19:23

blarghmatey force-pushed the chore/migrate-to-uv-and-k8s-container-grader branch from d5f0efc to 4bd2753 Compare March 20, 2026 16:48

Conversation

blarghmatey commented Mar 11, 2026

Summary

Why this change?

What changed

uv migration (chore: commit)

Compatibility fixes (fix: commit)

ContainerGrader (feat: commit)

Grader base image (feat: commit)

Kubernetes manifests (feat: commit)

What is NOT changing

Security model comparison

Testing

Uh oh!

gemini-code-assist bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

blarghmatey commented Mar 11, 2026

Uh oh!

gemini-code-assist bot commented Mar 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

uv migration (`chore:` commit)

Compatibility fixes (`fix:` commit)

ContainerGrader (`feat:` commit)

Grader base image (`feat:` commit)

Kubernetes manifests (`feat:` commit)