Contributing to Mellea

Thank you for your interest in contributing to Mellea! This guide will help you get started with developing and contributing to the project.

Contribution Pathways

There are several ways to contribute to Mellea:

1. Contributing to This Repository

Contribute to the Mellea core, standard library, or fix bugs. This includes:

Core features and bug fixes
Standard library components (Requirements, Components, Sampling Strategies)
Backend improvements and integrations
Documentation and examples
Tests and CI/CD improvements

Process: See the Pull Request Process section below for detailed steps.

2. Applications & Libraries

Build tools and applications using Mellea. These can be hosted in your own repository. For observability, use a mellea- prefix.

Examples:

github.com/my-company/mellea-legal-utils
github.com/my-username/mellea-swe-agent

3. Community Components

Contribute experimental or specialized components to mellea-contribs.

Note: For general-purpose Components, Requirements, or Sampling Strategies, please open an issue first to discuss whether they should go in the standard library (this repository) or mellea-contribs.

Code of Conduct

This project adheres to the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior to melleaadmin@ibm.com.

Getting Started

Prerequisites

Python 3.10 or higher (3.13+ requires Rust compiler for outlines)
uv (recommended) or conda/mamba
Ollama with required models (for local testing)

Installation with `uv` (Recommended)

Fork and clone the repository:

git clone ssh://git@github.com/<your-username>/mellea.git
cd mellea/

Setup virtual environment:

uv venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

# Install all dependencies (recommended for development)
uv sync --all-extras --all-groups

# Or install just the backend dependencies
uv sync --extra backends --all-groups

Install pre-commit hooks (Required):
```
pre-commit install
```
Note: Some hooks require tools in dev dependency groups to be on your PATH. Activate the virtual environment before committing to ensure they are available:
```
source .venv/bin/activate
```

Installation with `conda`/`mamba`

Fork and clone the repository:

git clone ssh://git@github.com/<your-username>/mellea.git
cd mellea/

Run the installation script:
```
conda/install.sh
```

This script handles environment setup, dependency installation, and pre-commit hook installation.

Verify Installation

# Start Ollama (required for most tests)
ollama serve

# Run fast tests (skip qualitative tests, ~2 min)
uv run pytest -m "not qualitative"

Directory Structure

Path	Contents
`mellea/core`	Core abstractions: Backend, Base, Formatter, Requirement, Sampling
`mellea/stdlib`	Standard library: Session, Context, Components, Requirements, Sampling, Intrinsics, Tools
`mellea/backends`	Backend providers: HF, OpenAI, Ollama, Watsonx, LiteLLM
`mellea/formatters`	Output formatters and parsers
`mellea/helpers`	Utilities, logging, model ID tables
`mellea/templates`	Jinja2 templates for prompts
`cli/`	CLI commands (`m serve`, `m alora`, `m decompose`, `m eval`)
`test/`	All tests (run from repo root)
`docs/`	Documentation, examples, tutorials

Coding Standards

Type Annotations

Required on all core functions:

def process_text(text: str, max_length: int = 100) -> str:
    """Process text with maximum length."""
    return text[:max_length]

Docstrings

Docstrings are prompts - the LLM reads them, so be specific.

Use Google-style docstrings:

def extract_entities(text: str, entity_types: list[str]) -> dict[str, list[str]]:
    """Extract named entities from text.

    Args:
        text: The input text to analyze.
        entity_types: List of entity types to extract (e.g., ["PERSON", "ORG"]).

    Returns:
        Dictionary mapping entity types to lists of extracted entities.

    Example:
        >>> extract_entities("Alice works at IBM", ["PERSON", "ORG"])
        {"PERSON": ["Alice"], "ORG": ["IBM"]}
    """
    ...

Class and `init` docstrings

Place Args: on the class docstring only. The __init__ docstring should be a single summary sentence with no Args: section. This keeps hover docs clean in IDEs and ensures the docs pipeline (which skips __init__) publishes the full parameter list.

class MyComponent(Component[str]):
    """A component that does something useful.

    Args:
        name (str): Human-readable label for this component.
        max_tokens (int): Upper bound on generated tokens.
    """

    def __init__(self, name: str, max_tokens: int = 256) -> None:
        """Initialize MyComponent with a name and token budget."""
        self.name = name
        self.max_tokens = max_tokens

Add an Attributes: section on the class docstring only when a stored attribute differs in type or behaviour from the constructor input — for example, when a str argument is wrapped into a CBlock, or when a class-level constant is relevant to callers. Pure-echo entries that repeat Args: verbatim should be omitted.

TypedDict classes are a special case. Their fields are the entire public contract, so when an Attributes: section is present it must exactly match the declared fields. The audit will flag:

typeddict_phantom — Attributes: documents a field that is not declared in the TypedDict
typeddict_undocumented — a declared field is absent from the Attributes: section

class ConstraintResult(TypedDict):
    """Result of a constraint check.

    Attributes:
        passed: Whether the constraint was satisfied.
        reason: Human-readable explanation.
    """
    passed: bool
    reason: str

Validating docstrings

Run the coverage and quality audit to check your changes before committing:

# Build fresh API docs then audit quality (documented symbols only)
uv run python tooling/docs-autogen/generate-ast.py
uv run python tooling/docs-autogen/audit_coverage.py \
    --quality --no-methods --docs-dir docs/docs/api

Key checks the audit enforces:

Check	Meaning
`no_class_args`	Class has typed `__init__` params but no `Args:` on the class docstring
`duplicate_init_args`	`Args:` appears in both the class and `__init__` docstrings (Option C violation)
`no_args`	Standalone function has params but no `Args:` section
`no_returns`	Function has a non-trivial return annotation but no `Returns:` section
`param_mismatch`	`Args:` documents names not present in the actual signature
`typeddict_phantom`	`TypedDict` `Attributes:` documents a field not declared in the class
`typeddict_undocumented`	`TypedDict` has a declared field absent from its `Attributes:` section

IDE hover verification — open any of these existing classes in VS Code and hover over the class name or a constructor call to confirm the hover card shows Args: once with no duplication:

ReactInitiator (mellea/stdlib/components/react.py) — Args: + Attributes: (goal: str → CBlock transform)
BaseSamplingStrategy (mellea/stdlib/sampling/base.py) — Args: only, no Attributes: (pure-echo removed)
TokenToFloat (mellea/formatters/granite/intrinsics/output.py) — Attributes: for YAML_NAME class constant

Code Style

Ruff for linting and formatting
Use ... in @generative function bodies
Prefer primitives over classes for simplicity
Keep functions focused and single-purpose
Avoid over-engineering

Formatting and Linting

# Format code
uv run ruff format .

# Lint code
uv run ruff check .

# Fix auto-fixable issues
uv run ruff check --fix .

# Type check
uv run mypy .

Development Workflow

Commit Messages

Follow Angular commit format:

<type>: <subject>

<body>

<footer>

Types: feat, fix, docs, test, refactor, release

Example:

feat: add support for streaming responses

Implements streaming for all backend types with proper
error handling and timeout management.

Closes #123

Important: Always sign off commits using -s or --signoff:

git commit -s -m "feat: your commit message"

Pre-commit Hooks

Pre-commit hooks run automatically before each commit and check:

Ruff - Linting and formatting
mypy - Type checking
uv-lock - Dependency lock file sync
codespell - Spell checking

Bypass hooks (for intermediate commits):

git commit -n -m "wip: intermediate work"

Run hooks manually:

pre-commit run --all-files

⚠️ Warning: pre-commit --all-files may take several minutes. Don't cancel mid-run as it can corrupt state.

Pull Request Process

Create an issue describing your change (if not already exists)
Fork the repository (if you haven't already)
Create a branch in your fork using appropriate naming
Make your changes following coding standards
Add tests for new functionality
Run the test suite to ensure everything passes
Update documentation as needed
Push to your fork and create a pull request to the main repository
Follow the automated PR workflow instructions

Testing

Quick Reference

# Install all dependencies (required for tests)
uv sync --all-extras --all-groups

# Start Ollama (required for most tests)
ollama serve

# Default: qualitative tests, skip slow tests
uv run pytest

# Fast tests only (no qualitative, ~2 min)
uv run pytest -m "not qualitative"

# Run only slow tests (>5 min)
uv run pytest -m slow

# Run ALL tests including slow (bypass config)
uv run pytest --co -q

# Run specific backend tests
uv run pytest -m "ollama"
uv run pytest -m "openai"

# Run tests without LLM calls (unit tests only)
uv run pytest -m "not llm"

# CI/CD mode (skips qualitative tests)
CICD=1 uv run pytest

# Lint and format
uv run ruff format .
uv run ruff check .

Required Models

Ollama

granite4:micro-h
granite3.2-vision
granite4:micro
qwen2.5vl:7b

Note: ollama models can be obtained by running ollama pull <model>

Test Markers

Tests are categorized using pytest markers:

Backend Markers:

@pytest.mark.ollama - Requires Ollama running (local, lightweight)
@pytest.mark.huggingface - Requires HuggingFace backend (local, heavy)
@pytest.mark.vllm - Requires vLLM backend (local, GPU required)
@pytest.mark.openai - Requires OpenAI API (requires API key)
@pytest.mark.watsonx - Requires Watsonx API (requires API key)
@pytest.mark.litellm - Requires LiteLLM backend

Capability Markers:

@pytest.mark.requires_gpu - Requires GPU
@pytest.mark.requires_heavy_ram - Requires 48GB+ RAM
@pytest.mark.requires_api_key - Requires external API keys
@pytest.mark.qualitative - LLM output quality tests (skipped in CI via CICD=1)
@pytest.mark.llm - Makes LLM calls (needs at least Ollama)
@pytest.mark.slow - Tests taking >5 minutes (skipped via SKIP_SLOW=1)

Execution Strategy Markers:

@pytest.mark.requires_gpu_isolation - Requires OS-level process isolation to clear CUDA memory (use with --isolate-heavy or CICD=1)

Default behavior:

uv run pytest skips slow tests (>5 min) but runs qualitative tests
Use pytest -m "not qualitative" for fast tests only (~2 min)
Use pytest -m slow or pytest --co -q to include slow tests

⚠️ Don't add qualitative to trivial tests - keep the fast loop fast. ⚠️ Mark tests taking >5 minutes with slow (e.g., dataset loading, extensive evaluations).

For detailed information about test markers, resource requirements, and running specific test categories, see test/MARKERS_GUIDE.md.

CI/CD Tests

CI runs the following checks on every pull request:

Pre-commit hooks (pre-commit run --all-files) - Ruff, mypy, uv-lock, codespell
Test suite (CICD=1 uv run pytest) - Skips qualitative tests for speed

To replicate CI locally:

# Run pre-commit checks (same as CI)
pre-commit run --all-files

# Run tests with CICD flag (same as CI, skips qualitative tests)
CICD=1 uv run pytest

Timing Expectations

Fast tests (-m "not qualitative"): ~2 minutes
Default tests (qualitative, no slow): Several minutes
Slow tests (-m slow): >5 minutes
Pre-commit hooks: 1-5 minutes

⚠️ Don't cancel mid-run - canceling pytest or pre-commit can corrupt state.

Common Issues & Troubleshooting

Problem	Fix
`ComponentParseError`	LLM output didn't match expected type. Add examples to docstring.
`uv.lock` out of sync	Run `uv sync` to update lock file.
`Ollama refused connection`	Run `ollama serve` to start Ollama server.
`ConnectionRefusedError` (port 11434)	Ollama not running. Start with `ollama serve`.
`TypeError: missing positional argument`	First argument to `@generative` function must be session `m`.
Output is wrong/None	Model too small or needs better prompt. Try larger model or add `reasoning` field.
`error: can't find Rust compiler`	Python 3.13+ requires Rust for outlines. Install Rust or use Python 3.12.
Tests fail on Intel Mac	Use conda: `conda install 'torchvision>=0.22.0'` then `uv pip install mellea`.
Pre-commit hooks fail	Run `pre-commit run --all-files` to see specific issues. Fix or use `git commit -n` to bypass. If a tool reports `command not found`, activate the virtual environment before committing: `source .venv/bin/activate`.

Debugging Tips

# Enable debug logging
from mellea.core import FancyLogger
FancyLogger.get_logger().setLevel("DEBUG")

# See exact prompt sent to LLM
print(m.last_prompt())

Getting Help

Check this guide and test/MARKERS_GUIDE.md
Search existing issues
Check out Github Discussions
Open a new issue with the appropriate label

Additional Resources

Documentation

Docs writing guide - Conventions, PR checklist, and review process for documentation contributions
API Documentation - Published documentation site
Test Markers Guide - Detailed pytest marker documentation
AGENTS.md - Guidelines for AI assistants working on Mellea internals
AGENTS_TEMPLATE.md - Template for projects using Mellea

Community

GitHub Issues - Report bugs or request features
GitHub Discussions - Ask questions and share ideas

Related Repositories

mellea-contribs - Community contributions

Feedback Loop

Found a bug, workaround, or pattern while contributing?

Issue/workaround? → Add to Common Issues section
Usage pattern? → Add to docs/AGENTS_TEMPLATE.md
New pitfall? → Add warning to relevant section

Help us improve this guide by opening a PR with your additions!

Thank you for contributing to Mellea! 🎉

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History