Skip to content

feat: support for OpenAI Batch API #487

@oneonlee

Description

@oneonlee

Priority Level

Medium (Nice to have)

Is your feature request related to a problem? Please describe.

Data Designer currently makes all LLM calls via the synchronous Chat Completions API (/v1/chat/completions). When generating large-scale synthetic datasets through OpenAI, this means paying full price for every request — even though the workload is inherently offline and latency-insensitive.

OpenAI's Batch API (/v1/batches) offers 50% cost reduction for asynchronous workloads with a 24-hour turnaround, which is a natural fit for synthetic data generation.

Describe the solution you'd like

Add an optional use_batch_api: bool flag (or a new provider type / execution mode) that routes OpenAI requests through the Batch API instead of the real-time endpoint. A possible implementation could:

  1. Collect all pending requests for a column into a JSONL file
  2. Submit the batch via POST /v1/batches
  3. Poll for completion (or use a callback mechanism)
  4. Parse the results back into the column pipeline

This could live alongside the existing max_parallel_requests concurrency model — users would choose between low-latency real-time generation and cost-optimized batch generation.

Describe alternatives you've considered

Why This Matters

  • Cost: 50% savings on OpenAI API costs at scale
  • Rate limits: Batch API has a separate, more generous rate limit quota
  • Use case fit: SDG workloads are offline by nature — there's no need for real-time responses

Additional context

  • OpenAI Batch API docs: https://platform.openai.com/docs/guides/batch
  • Current architecture processes datasets in batches (buffer_size) with parallel cells (max_parallel_requests), so integrating a batch submission step per column per buffer could align well with the existing execution model
  • Anthropic also offers a similar Message Batches API, so this pattern could generalize to other providers

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions