GitHub - YassWorks/Evalda-DQ: DataQuest Hackathon judging and scoring platform.

Open-source ML competition judging platform

Securely evaluate untrusted machine learning models in isolated Docker containers,
score them on accuracy, size, and latency, and rank teams on a live anonymous leaderboard.

Architecture · Workflow

Battle-Tested

Evalda was built for DataQuest 2026 — the annual data science competition run by the IEEE INSAT SB CS Chapter and ACM INSAT as part of the DataOverflow event.

Metric	Value
Duration	7 continuous hours
Participants	164 across 41 teams
Total submissions	1,452 (435 accepted, 1,017 rejected)
Peak throughput	400 submissions in one hour
Total downtime	< 1 minute (two brief maintenance windows)

How It Works

Teams submit a .zip containing a Python solution, a trained model, and a requirements.txt. The system:

Validates the zip in a security sandbox (path traversal, zip bombs, symlinks, extension allowlist)
Extracts and sanitizes requirements.txt (blocks malicious pip options)
Installs dependencies in an isolated container with outbound-only internet
Runs the model in a fully sandboxed container (no network, no labels, resource-capped)
Scores the predictions in a trusted process outside all containers
Streams real-time progress to the participant via WebSocket
Updates the anonymous leaderboard with blind-hour support

Every container runs with all capabilities dropped, no privilege escalation, memory and CPU limits, and tmpfs with noexec. Ground-truth labels never enter any container. See ARCHITECTURE.md for the full rationale.

flowchart LR
%% Defining aesthetic styles for the docs
classDef proxy fill:#2C3E50,stroke:#fff,stroke-width:2px,color:#fff;
classDef api fill:#059669,stroke:#fff,stroke-width:2px,color:#fff;
classDef worker fill:#D97706,stroke:#fff,stroke-width:2px,color:#fff;
classDef docker fill:#2496ED,stroke:#fff,stroke-width:2px,color:#fff;
classDef db fill:#3ECF8E,stroke:#fff,stroke-width:2px,color:#111;
classDef cache fill:#DC382D,stroke:#fff,stroke-width:2px,color:#fff;

    %% Core Pipeline
    N[Nginx]:::proxy -->|HTTP / REST| F(FastAPI<br>4 Workers):::api
    F -->|Enqueues Tasks| C{Celery<br>2 Workers}:::worker
    C -->|Spins up / Executes| D[[Docker Containers]]:::docker
    D -->|Persists Data| S[(Supabase)]:::db

    %% Redis Sub-system
    R[(Redis<br>Broker, Streams,<br>Rate Limits, Cache)]:::cache

    %% Connections to Redis
    F <-->|Rate limits, Cache,<br>Publishes to Stream| R
    C <-->|Consumes from Stream,<br>Updates State| R

    %% Creating a bounding box for the Backend Logic to group them visually
    subgraph Core Backend
        F
        C
        R
    end

Tech Stack

Component	Technology	Role
Frontend	Next.js 16, shadcn/ui, TanStack Query	Submission UI, team dashboard, leaderboard, admin panel
Backend API	FastAPI, 4 uvicorn workers	Auth, rate limiting, submission intake, WebSocket streaming
Task Queue	Celery, 2 workers	Judging pipeline orchestration
Containers	Docker (socket-mounted, sibling containers)	Sandboxed code execution across 4 phases
Database	Supabase Postgres + RLS	Profiles, teams, submissions, whitelist
Auth	Supabase Auth (JWT)	Whitelist-gated registration, token verification
Storage	Supabase Storage	Submission zip files (private, 50MB, zip-only)
Cache / Broker	Redis 7.2	Task broker, verdict streams, rate limits, leaderboard cache
Reverse Proxy	Nginx	SSL termination, request filtering
Deployment	Azure VM + Vercel + Supabase Cloud	Single-VM backend, frontend, managed DB

Documentation

Document	Description
ARCHITECTURE.md	System design, trust boundaries, security model, technology justifications, and lessons learned
WORKFLOW.md	Step-by-step submission lifecycle from upload to leaderboard, with sequence diagram

Quick Start

Prerequisites

Docker and Docker Compose
Node.js 18+
A Supabase project (free tier works)

Backend

cd backend
cp .env.example .env
# Fill in your Supabase credentials, Redis password, and admin account

docker compose up --build

This starts Nginx, Redis, the FastAPI backend, and the Celery worker. On first run, the system automatically:

Downloads validation data (features + labels) from configured URLs
Seeds teams and whitelist entries from data/teams.json
Creates the admin account
Builds the sandbox and judge Docker images

Frontend

cd frontend
npm install
cp .env.example .env.local
# Set NEXT_PUBLIC_SUPABASE_URL, NEXT_PUBLIC_SUPABASE_ANON_KEY, BACKEND_URL

npm run dev

Database

Apply the Supabase migrations in order:

supabase db push

Or apply them manually from supabase/migrations/ in your Supabase dashboard. The migrations create:

profiles and teams tables with RLS policies
team_whitelist for registration gating
submissions table and storage bucket
RPCs for atomic score updates (locked to service_role)
Triggers for role protection and auto-profile creation

Team Data

Create a data/teams.json (see data/teams.example.json) with your teams:

[
    {
        "name": "Team Alpha",
        "leader_email": "leader@example.com",
        "members": ["member1@example.com", "member2@example.com"]
    }
]

Only whitelisted emails can register. On signup, users are automatically linked to their team.

Configuration

Key environment variables (see backend/.env.example for the full list):

Variable	Description
`SUPABASE_URL`	Your Supabase project URL
`SUPABASE_KEY`	Supabase `service_role` key (not the anon key)
`SUPABASE_JWT_SECRET`	JWT secret for HS256 token verification
`REDIS_PASSWORD`	Redis authentication password
`ADMIN_EMAIL` / `ADMIN_PASSWORD`	Admin account credentials (seeded on startup)
`COMPETITION_START` / `COMPETITION_END`	ISO timestamps for competition window
`BLIND_DURATION_HOURS`	Hours before end when leaderboard freezes (default: 1)
`MAX_SUBMISSIONS_PER_TEAM`	Accepted submission cap per team (default: 20)
`JUDGE_MEM_LIMIT`	Memory limit for builder/runner containers (default: `1g`)
`JUDGE_TIMEOUT_SECONDS`	Container execution timeout (default: 120)
`FEATURES_URL` / `LABELS_URL`	URLs to download validation data on startup
`TEAMS_URL`	URL to download team data JSON on startup

Project Structure

Evalda-DQ/
├── backend/
│   ├── main.py                              # FastAPI app, lifespan, startup
│   ├── app/
│   │   ├── src/
│   │   │   ├── routers/                     # Thin HTTP/WS endpoints
│   │   │   ├── services/                    # Business logic layer
│   │   │   │   ├── submissions_service.py
│   │   │   │   ├── security_service.py
│   │   │   │   ├── judge_service.py         # 4-phase container pipeline
│   │   │   │   ├── docker_service.py        # Container lifecycle management
│   │   │   │   ├── scorer.py                # Trusted scoring (runs in worker)
│   │   │   │   ├── stream_service.py        # WebSocket verdict streaming
│   │   │   │   ├── leaderboard_service.py
│   │   │   │   ├── worker/                  # Celery task definitions
│   │   │   │   └── scripts/                 # Template scripts for containers
│   │   │   ├── auth/                        # JWT verification, rate limiting, WS guards
│   │   │   ├── models/                      # Pydantic models
│   │   │   ├── db/                          # Supabase + Redis client factories
│   │   │   └── settings/                    # Centralized configuration
│   │   └── utils/                           # Logger, seeder, janitor, data downloader
│   ├── sandbox/                             # Security sandbox (verify.py + Dockerfile)
│   ├── judge/                               # Runner + judge Dockerfile
│   ├── template/                            # Participant solution template + docs
│   ├── compose.yml                          # Dev environment
│   └── compose.prod.yml                     # Production (SSL, Let's Encrypt)
├── frontend/                                # Next.js 16 app
│   ├── app/                                 # App Router pages
│   ├── components/                          # UI components (shadcn/ui)
│   └── lib/                                 # Supabase clients, server actions, types
├── supabase/
│   ├── migrations/                          # SQL migrations (schema, RLS, RPCs)
│   └── config.toml                          # Supabase local dev config
├── docs/
│   ├── ARCHITECTURE.md                      # System design deep dive
│   └── WORKFLOW.md                          # Submission lifecycle walkthrough
└── README.md                                # This file

Key Design Decisions

These are covered in depth in ARCHITECTURE.md. The highlights:

Scoring runs outside all containers. The original design ran scoring alongside user code. That would cause participants to discover the communication channel and overwrite predictions with perfect answers. The scorer now runs in the trusted worker process. Labels never enter any container.
Four-phase pipeline with progressive network access. Verify (no network) → Extract (no network) → Build (outbound only, --only-binary :all:) → Run (no network). Each phase gets exactly the permissions it needs.
requirements.txt sanitization. A regex blocks dangerous pip options (--extra-index-url, --no-build-isolation, etc.) before dependencies are installed. --only-binary :all: eliminates setup.py as an attack surface entirely.
Anonymous leaderboard with blind mode. Teams only see their own name. During the final hour, rankings freeze publicly while scores are still recorded.
JWKS-verified rate limiting. The rate limiter cryptographically verifies JWTs using Supabase's public keys, preventing spoofed user IDs from bypassing per-user limits.

Adapting for Your Competition

Evalda was designed for ML model evaluation but the architecture generalizes. To adapt it:

Change the scoring logic — modify backend/app/src/services/scorer.py and backend/judge/runner.py
Change the submission format — modify backend/sandbox/verify.py (extension allowlist, required files)
Change resource limits — adjust environment variables (JUDGE_MEM_LIMIT, JUDGE_TIMEOUT_SECONDS, etc.)
Change the dataset — point FEATURES_URL and LABELS_URL to your data
Change team structure — edit data/teams.json and the seeder

The security sandbox, container pipeline, streaming infrastructure, and leaderboard work independently of the scoring domain.

Acknowledgments

Evalda was created for DataQuest 2026, the annual data science competition closing out the DataOverflow event — a collaboration between the IEEE INSAT SB CS Chapter and ACM INSAT.

Security audit and penetration testing by Salah Chafai, whose findings directly shaped the hardening of the RLS policies, RPC permissions, and WebSocket infrastructure.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 268 Commits
.github		.github
assets		assets
backend		backend
docs		docs
frontend		frontend
supabase		supabase
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Battle-Tested

How It Works

Tech Stack

Documentation

Quick Start

Prerequisites

Backend

Frontend

Database

Team Data

Configuration

Project Structure

Key Design Decisions

Adapting for Your Competition

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Battle-Tested

How It Works

Tech Stack

Documentation

Quick Start

Prerequisites

Backend

Frontend

Database

Team Data

Configuration

Project Structure

Key Design Decisions

Adapting for Your Competition

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages