Skip to content

Annotations #847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
6 changes: 2 additions & 4 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,10 @@ Here's a [YouTube video demo](https://www.youtube.com/watch?v=QUXQNi6jQ30) and [
Background on this project:
- [llm, ttok and strip-tags—CLI tools for working with ChatGPT and other LLMs](https://simonwillison.net/2023/May/18/cli-tools-for-llms/)
- [The LLM CLI tool now supports self-hosted language models via plugins](https://simonwillison.net/2023/Jul/12/llm/)
- [Accessing Llama 2 from the command-line with the llm-replicate plugin](https://simonwillison.net/2023/Jul/18/accessing-llama-2/)
- [Run Llama 2 on your own Mac using LLM and Homebrew](https://simonwillison.net/2023/Aug/1/llama-2-mac/)
- [Catching up on the weird world of LLMs](https://simonwillison.net/2023/Aug/3/weird-world-of-llms/)
- [LLM now provides tools for working with embeddings](https://simonwillison.net/2023/Sep/4/llm-embeddings/)
- [Build an image search engine with llm-clip, chat with models with llm chat](https://simonwillison.net/2023/Sep/12/llm-clip-and-chat/)
- [Many options for running Mistral models in your terminal using LLM](https://simonwillison.net/2023/Dec/18/mistral/)
- [You can now run prompts against images, audio and video in your terminal using LLM](https://simonwillison.net/2024/Oct/29/llm-multi-modal/)
- [Structured data extraction from unstructured content using LLM schemas](https://simonwillison.net/2025/Feb/28/llm-schemas/)

For more check out [the llm tag](https://simonwillison.net/tags/llm/) on my blog.

Expand Down
11 changes: 11 additions & 0 deletions docs/openai-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ OpenAI Chat: o1-preview
OpenAI Chat: o1-mini
OpenAI Chat: o3-mini
OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instruct)
OpenAI Chat: gpt-4o-search-preview
OpenAI Chat: gpt-4o-mini-search-preview
```
<!-- [[[end]]] -->

Expand All @@ -64,6 +66,15 @@ See [the OpenAI models documentation](https://platform.openai.com/docs/models) f

[o1-pro](https://platform.openai.com/docs/models/o1-pro) is not available through the Chat Completions API used by LLM's default OpenAI plugin. You can install the new [llm-openai-plugin](https://github.com/simonw/llm-openai-plugin) plugin to access that model.

## Model features

The following features work with OpenAI models:

- {ref}`System prompts <usage-system-prompts>` can be used to provide instructions that have a higher weight than the prompt itself.
- {ref}`Attachments <usage-attachments>`. Many OpenAI models support image inputs - check which ones using `llm models --options`. Any model that accepts images can also accept PDFs.
- {ref}`Schemas <usage-schemas>` can be used to influence the JSON structure of the model output.
- {ref}`Model options <usage-model-options>` can be used to set parameters like `temperature`. Use `llm models --options` for a full list of supported options.

(openai-models-embedding)=

## OpenAI embedding models
Expand Down
54 changes: 52 additions & 2 deletions docs/plugins/advanced-model-plugins.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Features to consider for your model plugin include:
- Including support for {ref}`Async models <advanced-model-plugins-async>` that can be used with Python's `asyncio` library.
- Support for {ref}`structured output <advanced-model-plugins-schemas>` using JSON schemas.
- Handling {ref}`attachments <advanced-model-plugins-attachments>` (images, audio and more) for multi-modal models.
- Supporting {ref}`annotations <advanced-model-plugins-annotations>` for models that return different types of text, or objects that should be attached to sections of the response.
- Tracking {ref}`token usage <advanced-model-plugins-usage>` for models that charge by the token.

(advanced-model-plugins-api-keys)=
Expand Down Expand Up @@ -58,7 +59,7 @@ class MyAsyncModel(llm.AsyncModel):

async def execute(
self, prompt, stream, response, conversation=None
) -> AsyncGenerator[str, None]:
) -> AsyncGenerator[Union[llm.Chunk, str], None]:
if stream:
completion = await client.chat.completions.create(
model=self.model_id,
Expand All @@ -82,7 +83,7 @@ class MyAsyncModel(llm.AsyncKeyModel):
...
async def execute(
self, prompt, stream, response, conversation=None, key=None
) -> AsyncGenerator[str, None]:
) -> AsyncGenerator[Union[llm.Chunk, str], None]:
```


Expand Down Expand Up @@ -243,3 +244,52 @@ This example logs 15 input tokens, 340 output tokens and notes that 37 tokens we
```python
response.set_usage(input=15, output=340, details={"cached": 37})
```

(advanced-model-plugins-annotations)=

## Models that return annotations

Some models may return additional structured data to accompany their text output. LLM calls these **annotations**. Common use-cases for these include:

- Reasoning models that return a portion of text representing "thinking" tokens prior to the main response.
- Models that return structured citation information attached to portions of the text.
- Similarly, some search models return references to search reults used to generate the response.

Model plugins can return these annotations directly from their `execute()` method. This method usually yields a series of strings - to attach a citation to one of these strings, return a `Chunk` object instead:

```python
from llm import Chunk

...
# Inside the execute() method:
yield llm.Chunk(
text="This has an annotation",
annotation={
"title": "Document title",
"url": "https://example.com/document",
}
)
```
The `annotation=` must be a dictionary but can take any shape. LLM will automatically record the annotation with the start and end index of the generated text that it is attached to.

Some annotations may need to be attached to a point in the document without a separate end index. In this case the `text=` parameter should be set to `None`.

Models may exist that do not return their annotations as part of the general stream but instead produce them at the end of the response, specifying start and end indexes to show which parts of the text they should be attached to. This is often the case for non-streaming APIs.

For these cases the `response.add_annotations()` method should be used at the end of the `.execute()` method:

```python
response.add_annotations([
llm.Annotation(
start_index=0,
end_index=10,
data={
"title": "Document title",
"url": "https://example.com/document"
}
)
])
```
The method accepts a list of `llm.Annotation` objects, each with a `start_index=`, `end_index=` and `data=` dictionary describing the annotation.

For annotations that are attached to a point rather than a range the `start_index=` and `end_index=` should be the same integer value.
5 changes: 5 additions & 0 deletions docs/templates.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,11 @@ This can be combined with the `-m` option to specify a different model:
curl -s https://llm.datasette.io/en/latest/ | \
llm -t summarize -m gpt-3.5-turbo-16k
```
Templates can also be specified as full URLs to YAML files:
```bash
llm -t https://raw.githubusercontent.com/simonw/llm-templates/refs/heads/main/python-app.yaml \
'Python app to pick a random line from a file'
```

(prompt-templates-list)=

Expand Down
29 changes: 29 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ Will run a prompt of:
```
For models that support them, {ref}`system prompts <usage-system-prompts>` are a better tool for this kind of prompting.

(usage-model-options)=
### Model options

Some models support options. You can pass these using `-o/--option name value` - for example, to set the temperature to 1.5 run this:
Expand Down Expand Up @@ -754,6 +755,34 @@ OpenAI Completion: gpt-3.5-turbo-instruct (aliases: 3.5-instruct, chatgpt-instru
Include the log probabilities of most likely N per token
Features:
- streaming
OpenAI Chat: gpt-4o-search-preview
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
search_context_size: str
Features:
- streaming
- async
OpenAI Chat: gpt-4o-mini-search-preview
Options:
temperature: float
max_tokens: int
top_p: float
frequency_penalty: float
presence_penalty: float
stop: str
logit_bias: dict, str
seed: int
search_context_size: str
Features:
- streaming
- async

```
<!-- [[[end]]] -->
Expand Down
4 changes: 4 additions & 0 deletions llm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,13 @@
NeedsKeyException,
)
from .models import (
Annotation,
AsyncConversation,
AsyncKeyModel,
AsyncModel,
AsyncResponse,
Attachment,
Chunk,
Conversation,
EmbeddingModel,
EmbeddingModelWithAliases,
Expand All @@ -31,10 +33,12 @@
import struct

__all__ = [
"Annotation",
"AsyncConversation",
"AsyncKeyModel",
"AsyncResponse",
"Attachment",
"Chunk",
"Collection",
"Conversation",
"get_async_model",
Expand Down
39 changes: 26 additions & 13 deletions llm/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
AsyncConversation,
AsyncKeyModel,
AsyncResponse,
Chunk,
Collection,
Conversation,
Response,
Expand Down Expand Up @@ -561,6 +562,8 @@ async def inner():
)
if should_stream:
for chunk in response:
if isinstance(chunk, Chunk) and chunk.annotation:
print(chunk.annotation)
print(chunk, end="")
sys.stdout.flush()
print("")
Expand Down Expand Up @@ -2524,7 +2527,28 @@ def logs_db_path():
return user_dir() / "logs.db"


def _parse_yaml_template(name, content):
try:
loaded = yaml.safe_load(content)
except yaml.YAMLError as ex:
raise click.ClickException("Invalid YAML: {}".format(str(ex)))
if isinstance(loaded, str):
return Template(name=name, prompt=loaded)
loaded["name"] = name
try:
return Template(**loaded)
except pydantic.ValidationError as ex:
msg = "A validation error occurred:\n"
msg += render_errors(ex.errors())
raise click.ClickException(msg)


def load_template(name):
if name.startswith("https://") or name.startswith("http://"):
response = httpx.get(name)
response.raise_for_status()
return _parse_yaml_template(name, response.text)

if ":" in name:
prefix, rest = name.split(":", 1)
loaders = get_template_loaders()
Expand All @@ -2541,19 +2565,8 @@ def load_template(name):
path = template_dir() / f"{name}.yaml"
if not path.exists():
raise click.ClickException(f"Invalid template: {name}")
try:
loaded = yaml.safe_load(path.read_text())
except yaml.YAMLError as ex:
raise click.ClickException("Invalid YAML: {}".format(str(ex)))
if isinstance(loaded, str):
return Template(name=name, prompt=loaded)
loaded["name"] = name
try:
return Template(**loaded)
except pydantic.ValidationError as ex:
msg = "A validation error occurred:\n"
msg += render_errors(ex.errors())
raise click.ClickException(msg)
content = path.read_text()
return _parse_yaml_template(name, content)


def get_history(chat_id):
Expand Down
Loading