Which guide? Modifying
mellea/,cli/, ortest/→ this file. Writing code that imports Mellea →docs/AGENTS_TEMPLATE.md.
Code of Conduct: This project adheres to a Code of Conduct. All contributors, including AI assistants, are expected to follow these community standards when generating code, documentation, or interacting with the project.
uv for Python commands — never use system Python or pip directly.
- Run Python scripts:
uv run python script.py(notpython script.py) - Run tools:
uv run pytest,uv run ruff(notpytest,ruff) - Install deps:
uv sync(notpip install) - The virtual environment is
.venv/—uv runautomatically uses it
pre-commit install # Required: install git hooks
uv sync --all-extras --all-groups # Install all deps (required for tests)
uv sync --extra backends --all-groups # Install just backend deps (lighter)
ollama serve # Start Ollama (required for most tests)
uv run pytest # Default: qualitative tests, skip slow tests
uv run pytest -m "not qualitative" # Fast tests only (~2 min)
uv run pytest -m slow # Run only slow tests (>5 min)
uv run pytest --co -q # Run ALL tests including slow (bypass config)
uv run pytest --isolate-heavy # Enable GPU process isolation (opt-in)
uv run ruff format . # Format code
uv run ruff check . # Lint code
uv run mypy . # Type checkBranches: feat/topic, fix/issue-id, docs/topic
| Path | Contents |
|---|---|
mellea/core/ |
Core abstractions: Backend, Base, Formatter, Requirement, Sampling |
mellea/stdlib/ |
Standard library: Sessions, Components, Context |
mellea/backends/ |
Providers: HF, OpenAI, Ollama, Watsonx, LiteLLM |
mellea/formatters/ |
Output formatters for different types |
mellea/templates/ |
Jinja2 templates |
mellea/helpers/ |
Utilities, logging, model ID tables |
cli/ |
CLI commands (m serve, m alora, m decompose, m eval) |
test/ |
All tests (run from repo root) |
docs/examples/ |
Example code (run as tests via pytest) |
scratchpad/ |
Experiments (git-ignored) |
All tests and examples use markers to indicate requirements. The test infrastructure automatically skips tests based on system capabilities.
Backend Markers:
@pytest.mark.ollama— Requires Ollama running (local, lightweight)@pytest.mark.huggingface— Requires HuggingFace backend (local, heavy)@pytest.mark.vllm— Requires vLLM backend (local, GPU required)@pytest.mark.openai— Requires OpenAI API (requires API key)@pytest.mark.watsonx— Requires Watsonx API (requires API key)@pytest.mark.litellm— Requires LiteLLM backend
Capability Markers:
@pytest.mark.requires_gpu— Requires GPU@pytest.mark.requires_heavy_ram— Requires 48GB+ RAM@pytest.mark.requires_api_key— Requires external API keys@pytest.mark.qualitative— LLM output quality tests (skipped in CI viaCICD=1)@pytest.mark.llm— Makes LLM calls (needs at least Ollama)@pytest.mark.slow— Tests taking >5 minutes (skipped viaSKIP_SLOW=1)
Execution Strategy Markers:
@pytest.mark.requires_gpu_isolation— Requires OS-level process isolation to clear CUDA memory (use with--isolate-heavyorCICD=1)
Examples in docs/examples/ use comment-based markers for clean code:
# pytest: ollama, llm, requires_heavy_ram
"""Example description..."""
# Your clean example code hereTests/examples automatically skip if system lacks required resources. Heavy examples (e.g., HuggingFace) are skipped during collection to prevent memory issues.
Default behavior:
uv run pytestskips slow tests (>5 min) but runs qualitative tests- Use
pytest -m "not qualitative"for fast tests only (~2 min) - Use
pytest -m sloworpytest(without config) to include slow tests
qualitative to trivial tests—keep the fast loop fast.
slow (e.g., dataset loading, extensive evaluations).
- Types required on all core functions
- Docstrings are prompts — be specific, the LLM reads them
- Google-style docstrings —
Args:on the class docstring only;__init__gets a single summary sentence. AddAttributes:only when a stored value differs in type/behaviour from its constructor input (type transforms, computed values, class constants). See CONTRIBUTING.md for a full example. - Ruff for linting/formatting
- Use
...in@generativefunction bodies - Prefer primitives over classes
- Friendly Dependency Errors: Wraps optional backend imports in
try/except ImportErrorwith a helpful message (e.g., "Please pip install mellea[hf]"). Seemellea/stdlib/session.pyfor examples. - Backend telemetry fields: All backends must populate
mot.usage(dict withprompt_tokens,completion_tokens,total_tokens),mot.model(str), andmot.provider(str) in theirpost_processing()method. Metrics are automatically recorded byTokenMetricsPlugin— don't add manualrecord_token_usage_metrics()calls.
Angular format: feat:, fix:, docs:, test:, refactor:, release:
Pre-commit runs: ruff, mypy, uv-lock, codespell
Don't cancel:
pytest(full) andpre-commit --all-filesmay take minutes. Canceling mid-run can corrupt state.
| Problem | Fix |
|---|---|
ComponentParseError |
Add examples to docstring |
uv.lock out of sync |
Run uv sync |
| Ollama refused | Run ollama serve |
| Telemetry import errors | Run uv sync to install OpenTelemetry deps |
uv run pytest test/ -m "not qualitative"passes?ruff formatandruff checkclean?- New functions typed with concise docstrings?
- Unit tests added for new functionality?
- Avoided over-engineering?
- Place tests in
test/mirroring source structure - Name files
test_*.py(required for pydocstyle) - Use
gh_runfixture for CI-aware tests (seetest/conftest.py) - Mark tests checking LLM output quality with
@pytest.mark.qualitative - If a test fails, fix the code, not the test (unless the test was wrong)
If you are modifying or creating pages under docs/docs/, follow the writing
conventions in docs/docs/guide/CONTRIBUTING.md.
Key rules that differ from typical Markdown habits:
- No H1 in the body — Mintlify renders the frontmatter
titleautomatically; a body# Headingproduces a duplicate title in the published site - No
.mdextensions in internal links — use../concepts/requirements-system, not../concepts/requirements-system.md - Frontmatter required — every page needs
titleanddescription; addsidebarTitleif the title is long - markdownlint gate — run
npx markdownlint-cli2 "docs/docs/**/*.md"and fix all warnings before committing a doc page - Verified code only — every code example must be checked against the current
mellea source; mark forward-looking content with
> **Coming soon:** - No visible TODOs — if content is missing, open a GitHub issue instead
Found a bug, workaround, or pattern? Update the docs:
- Issue/workaround? → Add to Section 7 (Common Issues) in this file
- Usage pattern? → Add to
docs/AGENTS_TEMPLATE.md - New pitfall? → Add warning near relevant section