Skip to content

Conversation

@TomeHirata
Copy link
Collaborator

@TomeHirata TomeHirata commented Nov 4, 2025

Support File input by introducing dspy.File. This is the last content type supported by OpenAI chat completion that is not supported by DSPy natively.

Closes #8974 #8916

lm = dspy.LM(
    model="openai/gpt-5-nano",
    temperature=1.0,
)

dspy.configure(lm=lm)

class QA(dspy.Signature):
    file: dspy.File = dspy.InputField()
    summary = dspy.OutputField()
program = dspy.Predict(QA)
program(file=dspy.File.from_path("./2507.19457v1.pdf"))

Inspect history:

[2025-11-13T15:33:25.005187]

System message:

Your input fields are:
1. `file` (File):
Your output fields are:
1. `summary` (str):
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## file ## ]]
{file}

[[ ## summary ## ]]
{summary}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Given the fields `file`, produce the fields `summary`.


User message:

[[ ## file ## ]]
<file: name:2507.19457v1.pdf, id:, data_length:3825976>

Respond with the corresponding output fields, starting with the field `[[ ## summary ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


Response:

[[ ## summary ## ]]
GEPA (Genetic-Pareto) is a reflective-prompt optimization framework for compound AI systems that combines natural-language reflection with multi-objective evolutionary search. It aims to maximize downstream task performance while being highly sample-efficient by leveraging the interpretable, language-based traces produced during system rollouts. GEPA mutates prompts inside system modules based on natural-language feedback and maintains a Pareto front of high-performing prompts to encourage diverse strategies and robust generalization. Its core components are: (1) Reflective Prompt Mutation, which uses LLMs to analyze system traces and propose targeted prompt updates; (2) System-Aware Merge, a crossover mechanism that preserves complementary module-level insights; and (3) Pareto-based candidate selection, which preserves diverse, top-performing mutations across tasks and avoids premature convergence. The optimizer operates on a compound AI system Φ with modules πi and weights Θi, and uses training data Dtrain split into feedback (Dfeedback) and Pareto evaluation data (Dpareto). Rollouts are budget-limited, and learning signals come from a feedback function μf that can surface textual traces from evaluation metrics. GEPA was evaluated on four tasks—HotpotQA, IFBench, HoVer, and PUPA—across open (Qwen3 8B) and commercial (GPT-4.1 Mini) LLMs, comparing against baselines (Baseline, MIPROv2, GRPO). Key findings include: - GEPA achieves average gains of about +10% and up to +19% on held-out test sets, while using up to 35x fewer rollouts than GRPO. - GEPA consistently outperforms MIPROv2 across tasks and models (often by 10–14%), with GEPA+Merge offering additional improvements in some settings. - GEPA produces shorter prompts than MIPROv2 (prompts up to ~9x shorter) while delivering higher final performance, illustrating improved efficiency. - Pareto-based selection enables broader exploration and avoids local optima, outperforming SelectBestCandidate ablations. - GEPA’s reflective prompts show strong generalization and sample efficiency, and the approach extends to inference-time search for code optimization in domains like NPUEval and KernelBench, where substantial vector-utilization gains are demonstrated. The paper also discusses limitations (e.g., budget allocation between mutation and crossover, potential gains from weight-space updates) and outlines future directions (adaptive strategies, dynamic Pareto-sets, enhanced feedback strategies). In short, GEPA demonstrates that language-driven reflection, coupled with Pareto-aware exploration, yields robust, sample-efficient prompt optimization for complex, multi-module AI systems and shows promise as an inference-time search strategy for challenging tasks. [[ ## completed ## ]]

…a URI representation

- Added MIME type detection in `from_path` and `from_bytes` methods.
- Updated `__repr__` to display file data as a data URI.
- Modified tests to validate new functionality and ensure correct MIME type handling.

Signed-off-by: TomuHirata <[email protected]>
Copy link

@synaptiz synaptiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to add mime_type argument to from_file_id() function.

return cls(file_data=file_data, filename=filename)

@classmethod
def from_file_id(cls, file_id: str, filename: str | None = None) -> "File":
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be good to add mime_type argument to from_file_id() function. This function will be used when the client already has the URI of an uploaded file. If the client is aware of the mime_type of the file, they should be able to pass that too to this function. Since the URI may not always contain the file extension, DSPy will have no other way to determine the mime_type without reading the file contents.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would mime_type be used? According to OpenAI specification, mime_type is not a supported field of the file content part: https://platform.openai.com/docs/api-reference/chat/create#chat_create-messages-user_message-content-array_of_content_parts-file_content_part-file.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @TomeHirata ,

Thanks for sharing the specs link.

Based on my testing, it appears that LiteLLM expects format to be passed along with file_id. The format attribute is optional, if not passed, LiteLLM tries to infer it by reading the file content from the URL. This fails if the file isn't accessible. You can ignore my comment since this seems to be an issue on LiteLLM's side.

Best,

Rakesh

Copy link
Collaborator

@chenmoneygithub chenmoneygithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Can we include a code example in the PR description along with the DSPy history to showcase the feature?

Signed-off-by: TomuHirata <[email protected]>
- Updated the File class to include a reference to the OpenAI API specification for file content.
- Enhanced the _convert_chat_request_to_responses_request function to support file inputs, including file_data and file_id.
- Added comprehensive tests to validate the conversion of various file input formats in the responses API.

Signed-off-by: TomuHirata <[email protected]>
@chenmoneygithub
Copy link
Collaborator

@TomeHirata Mind adding the dspy.inspect_history() output in the PR description? It would be useful to see how the raw LM response looks like in this case.

Good to go afterwards!

…tion

- Enhanced the pretty_print_history function to support displaying file input types, including filename, file_id, and data length.
- Improved output formatting for better readability of file-related information.

Signed-off-by: TomuHirata <[email protected]>
@TomeHirata TomeHirata merged commit c542bb6 into stanfordnlp:main Nov 13, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Using DSPy with Google Gemini Models unable to read uploaded file's content

3 participants