A collaborative platform for annotating and evaluating LLM traces with MLflow integration, discovery phases, and inter-rater reliability analysis.
For detailed documentation, see the /doc folder:
- Facilitator Guide - A comprehensive guide for facilitators to deploy, configure, and run the workshop.
- Release Notes - Latest release information and quick start
- Build Guide - Client build instructions
- Authentication Fix - Authentication improvements
- Annotation Editing - Annotation editing features
- Database Migrations - SQLite schema migrations (Alembic)
- All Documentation - Complete documentation index
For production use, we recommend using the latest stable release:
π‘ Tip: View all releases at Releases Page
Download project-with-build.zip which includes pre-built frontend assets.
- Python 3.11+
- Node.js 22.16+
- Databricks workspace with:
- MLflow experiments
- Databricks Apps
- Strongly recommended: just
- Installation
- It's possible to use without this, but the majority of useful scripts use just.
-
Navigate to client directory:
cd client -
Install Node dependencies:
npm install
-
Start the development server:
npm run dev
The UI will be available at
http://localhost:3000 -
Build for production:
npm run build
-
Create a virtual environment and install dependencies:
uv venv source .venv/bin/activate # On Windows: .venv\Scripts\activate uv pip install -e .
-
Run the FastAPI development server in local:
uv run uvicorn server.app:app --reload --port 8000
The API will be available at
http://localhost:8000API documentation athttp://localhost:8000/docs
-
Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Python dependencies:
pip install -e . # Or for editable install with dev dependencies: pip install -e ".[dev]"
-
Run the FastAPI development server:
uvicorn server.app:app --reload --port 8000
The API will be available at
http://localhost:8000API documentation athttp://localhost:8000/docs
E2E tests are run with Playwright against a real local stack (FastAPI + Vite) using an isolated SQLite database.
# Run E2E tests headless (default)
just e2e
# Run E2E tests headed (useful for debugging)
just e2e headed
# Run E2E tests in Playwright UI mode
just e2e ui
# Debugging helpers
just e2e-servers # start API+UI against .e2e-workshop.db
just e2e-test # run tests (assumes servers are already running)Ensure you have the Databricks CLI installed and configured:
databricks --version
databricks current-user me # Verify authenticationdatabricks apps create human-eval-workshopcd client && npm install && npm run build && cd ..This creates an optimized production build in client/build/
DATABRICKS_USERNAME=$(databricks current-user me | jq -r .userName)
databricks sync . "/Workspace/Users/$DATABRICKS_USERNAME/human-eval-workshop"Refer to the Databricks Apps deploy documentation for more info.
databricks apps deploy human-eval-workshop \
--source-code-path /Workspace/Users/$DATABRICKS_USERNAME/human-eval-workshopOnce deployed, the Databricks CLI will provide a URL to access your application.
Configure facilitator accounts and security settings:
facilitators:
- email: "[email protected]"
password: "xxxxxxxxxx"
name: "Workshop Facilitator"
description: "Primary workshop facilitator"
security:
default_user_password: "changeme123"
password_requirements:
min_length: 8
require_uppercase: true
require_lowercase: true
require_numbers: true
session:
token_expiry_hours: 24
refresh_token_expiry_days: 7See LICENSE.MD file for details.