-
Notifications
You must be signed in to change notification settings - Fork 92
feat: support for OpenAI Batch API #487
Description
Priority Level
Medium (Nice to have)
Is your feature request related to a problem? Please describe.
Data Designer currently makes all LLM calls via the synchronous Chat Completions API (/v1/chat/completions). When generating large-scale synthetic datasets through OpenAI, this means paying full price for every request — even though the workload is inherently offline and latency-insensitive.
OpenAI's Batch API (/v1/batches) offers 50% cost reduction for asynchronous workloads with a 24-hour turnaround, which is a natural fit for synthetic data generation.
Describe the solution you'd like
Add an optional use_batch_api: bool flag (or a new provider type / execution mode) that routes OpenAI requests through the Batch API instead of the real-time endpoint. A possible implementation could:
- Collect all pending requests for a column into a JSONL file
- Submit the batch via
POST /v1/batches - Poll for completion (or use a callback mechanism)
- Parse the results back into the column pipeline
This could live alongside the existing max_parallel_requests concurrency model — users would choose between low-latency real-time generation and cost-optimized batch generation.
Describe alternatives you've considered
Why This Matters
- Cost: 50% savings on OpenAI API costs at scale
- Rate limits: Batch API has a separate, more generous rate limit quota
- Use case fit: SDG workloads are offline by nature — there's no need for real-time responses
Additional context
- OpenAI Batch API docs: https://platform.openai.com/docs/guides/batch
- Current architecture processes datasets in batches (
buffer_size) with parallel cells (max_parallel_requests), so integrating a batch submission step per column per buffer could align well with the existing execution model - Anthropic also offers a similar Message Batches API, so this pattern could generalize to other providers