-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Allow DSPy to use the native reasoning from models #8822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
35a60a7
to
8de0a65
Compare
if not litellm.supports_reasoning(lm.model): | ||
return False | ||
|
||
if "reasoning_effort" in lm_kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to check lm_kwargs
, instead we should probably overwrite reasoning_effort
if native reasoning is on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so reasoning_effort
has multiple valid values - low, medium, and high. I am setting the default value as low
, but if users specify the other values we want to keep those unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default reasoning_effort
on OpenAI reasoning models ismedium
, so I'd be careful setting it differently here
def __init__( | ||
self, | ||
signature: str | type[Signature], | ||
rationale_field: FieldInfo | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we show deprecation warnings if these arguments are used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sg, let me add that
|
||
def _process_completion(self, response, merged_kwargs): | ||
"""Process the response of OpenAI chat completion API and extract outputs. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it auto-formatting? Shall we add lint rule so that linting does not change lines unrelated a PR's scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's my plugin for trailing whitespaces. I didn't expect this PR to be this big at the first place...
tool_calls = [] | ||
reasoning_contents = [] | ||
|
||
for output_item in response.output: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I'm confused. What's the meaning of multiple items in the output field? I thought it'd be the same as choices of chat completion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely a weird point from Responses API - all these items in output
belongs to one answer. Basically Responses API doesn't have the n
arg that controls the number of answer to be generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked offline. It seems all output items represent one choice.
e806ef5
to
41a0313
Compare
41a0313
to
56973f0
Compare
return formatted | ||
|
||
@classmethod | ||
def adapt_to_native_lm_feature(cls, lm, lm_kwargs) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the naming, how about is_native_lm_feature_available
?
|
||
if "reasoning_effort" in lm_kwargs: | ||
# `lm_kwargs` overrides `lm.kwargs` | ||
reasoning_effort = lm_kwargs["reasoning_effort"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can simplify by reasoning_effort = lm_kwargs.get("reasoning_effort") or lm.kwargs.get("reasoning_effort")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh good catch! The structure is right but code is actually buggy - if users explicitly turn off native reasoning, we should respect the setting, so I am not using get
to capture the value. Updated the code.
|
||
from dspy.adapters.types.reasoning import Reasoning | ||
|
||
extended_signature = signature.prepend(name="reasoning", field=OutputField(), type_=Reasoning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we continue to return string in the reasoning field for backward compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dspy.Reasoning
is a str-like type. Basically:
- Printing a dspy.Reasoning prints out the string part, i.e., content.
- Users can directly compare a dspy.Reasoning to a string.
We need this specific type to modify the signature inside Adapter. I don't feel this is a perfectly clean solution, but it's the most robust way in my mind. would like to hear your thoughts!
dspy/clients/lm.py
Outdated
not self._warned_zero_temp_rollout | ||
and rollout_id is not None | ||
and (temperature is None or temperature == 0) | ||
# Normalize reasoning_effort to get reasoning summaries (for OpenAI reasoning models which don't expose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we move this logic to _convert_chat_request_to_responses_request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, done
for content_item in output_item.content: | ||
reasoning_contents.append(content_item.text) | ||
elif getattr(output_item, "summary", None) and len(output_item.summary) > 0: | ||
for summary_item in output_item.summary: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is this summary field used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another weird spot of Responses API - according to the API documentation, the reasoning content can be contained in content
or summary
of the reasoning block: https://platform.openai.com/docs/api-reference/responses/object. While we use the content
as golden, the API itself suggests that it's possible the reasoning is contained in summary
.
e91f7f7
to
c65b774
Compare
This is a followup work of #8764. This problem is harder than it looks like because:
reasoning
field.reasoning
field in output fields,dspy.Predict
should not returnreasoning
. Basically we need to keep the signature faithful.That basically means we need a way to identify users are specifying "reasoning" in their signature, and it fits into our definition of predefined type. Users are not forbidden from using reasoning of string type, but it won't use the native reasoning.