Optimization Pipeline

The optional optimization pipeline reduces spend before a call leaves your process: it trims long prompt history, caps output length, and can fall back to a cheaper model when an estimated cost crosses a per-request budget.

from driftlock import DriftlockClient, OptimizationConfig

client = DriftlockClient(
    api_key="sk-...",
    optimization=OptimizationConfig(
        max_prompt_tokens=3000,          # trim history if the prompt exceeds this
        keep_last_n_messages=10,         # always keep the N most recent turns
        always_keep_system=True,         # never drop the system message
        default_max_output_tokens=512,   # cap output when the caller omits max_tokens
        max_cost_per_request_usd=0.05,   # abort/fallback if estimated cost > $0.05
        budget_exceeded_action="fallback",
        fallback_model="gpt-4o-mini",
    ),
)

OptimizationConfig options

Field	Effect
`max_prompt_tokens`	Trim older messages until the prompt fits this budget
`keep_last_n_messages`	Always preserve the N most recent turns when trimming
`always_keep_system`	Never drop the system message
`default_max_output_tokens`	Apply `max_tokens` when the caller omits it
`max_cost_per_request_usd`	Per-request cost ceiling
`budget_exceeded_action`	`"raise"` (raise `BudgetExceededError`) or `"fallback"`
`fallback_model`	Model to switch to when `budget_exceeded_action="fallback"`
`shadow_mode`	Compute savings without modifying the outgoing request
`sample_rate` / `sample_key`	Apply optimization to a deterministic fraction of traffic

What gets logged

Every call logs an optimization block showing tokens and cost saved:

{
  "optimization": {
    "original_prompt_tokens": 3840,
    "optimized_prompt_tokens": 142,
    "tokens_saved": 3698,
    "cost_saved_usd": 0.0005547,
    "optimizations_applied": ["prompt_trim", "output_cap"],
    "quality_risk": true
  }
}

Shadow mode & sampling

Shadow mode (shadow_mode=True) computes and logs the savings the pipeline would have produced without changing the request you actually send — useful for measuring impact before turning it on.
Sampling (sample_rate, sample_key) applies optimization to a deterministic fraction of traffic keyed on a label (e.g. user_id), so a given key is consistently in or out of the experiment.

See driftlock/optimization.py for the pipeline internals.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization Pipeline

OptimizationConfig options

What gets logged

Shadow mode & sampling

FilesExpand file tree

optimization.md

Latest commit

History

optimization.md

File metadata and controls

Optimization Pipeline

OptimizationConfig options

What gets logged

Shadow mode & sampling