Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show reasoner model's thinking in a <<< thinking section #151

Merged

Conversation

juodumas
Copy link
Contributor

@juodumas juodumas commented Mar 7, 2025

Parse reasoning_content in response chunks and put it in a separate <<< thinking section. Do not send the contents of this section back to the model.

Tested with deepseek api and their deepseek-reasoner model.

Also tested locally with sglang --model-path Qwen/QwQ-32B-AWQ --reasoning-parser deepseek-r1:

>>> user

hi

<<< thinking

Okay, the user said "hi". I should respond in a friendly way. Maybe say "Hello! How can I assist you today?" That's a common greeting and opens the conversation for them to ask something. I need to keep it simple and welcoming. Let me check if there's anything else I should consider. No, that should be good. Alright, I'll go with that.


<<< assistant



Hello! How can I assist you today?

>>> user

(tested with deepseek api and deepseek-reasoner)
py/chat.py Outdated

text_chunks = make_chat_text_chunks(messages, options)
render_text_chunks(text_chunks)
for chunk in text_chunks:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it would be nice to re-use render_text_chunks helper function, it covers more functionality like AIRedo. For example filter thinking chunks first, then if not empty render it with render_text_chunks and then handle content

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it's confusing how render_text_chunks handles insertion to support in-place edits vs. chat.
If you want this in the render_text_chunks function, could you please first check if you agree with this change #148 ?

py/utils.py Outdated
@@ -156,10 +156,12 @@ def parse_chat_messages(chat_content):
for line in lines:
match line:
case '>>> system':
messages.append({'role': 'system', 'content': [{ 'type': 'text', 'text': '' }]})
messages.append({'role': 'system', 'content': ''})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this change needed? that also breaks unit tests.
Btw. it would be nice to have an unit test checking thinking section is omitted (tests/chat_test.py)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a regression introduced by some recent change. Perhaps should have been a separate PR.

A content array is not supported by some providers, namely:

  1. groq:
{"error":{"message":"'messages.0' : for 'role:system' the following must be satisfied[('messages.0.content' : value must be a string)]","type":"invalid_request_error"}}
  1. deepseek:
Failed to deserialize the JSON body into the target type: messages[0]: data did not match any variant of untagged enum ChatCompletionRequestContent at line 1 column 109

I haven't looked at the tests yet because I wanted to know if you will accept this PR.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, sounds good

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I may ask you to keep the previous structured format here and move this logic to make_chat_text_chunks - reformat text messages before the request. that would makes things easier when I'll be merging custom providers support.

py/utils.py Outdated
if reasoning_content := delta.get('reasoning_content'):
return {"thinking": reasoning_content}
if content := delta.get('content'):
return {"content": content}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no special reason, handling the chunk the same way as in map_chunk_no_stream would make it probably simpler, e.g.:

delta = _choices(resp)[0].get('message', {})
reasoning_content = delta.get('reasoning_content', '')
content = delta.get('content', '')
return {"content": content, "thinking": reasoning_content}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that a chunk can have {"content": None}, so delta.get('content', '') would return None, not an empty string. But I can simplify it taking inspiration from #150

@madox2
Copy link
Owner

madox2 commented Mar 7, 2025

Hi, thank you for the contribution! Even though reasoning_content is not part of the OpenAI API, I generally like the idea of the thinking sections. I would not be surprised other providers providing similar API. When we are done with custom provider extensions, this feature might be also useful for a better model integrations.

I currently can't test R1, but made a quick code review.

@juodumas
Copy link
Contributor Author

juodumas commented Mar 10, 2025

Thanks for the review.
Let me know if the code is acceptable after my changes and if you would like to see unit test updates.

@madox2
Copy link
Owner

madox2 commented Mar 10, 2025

It works great! I will write some unit tests myself while merging it into provider-extensions.

Thanks a lot @juodumas and @runningmaverick

@madox2 madox2 merged commit f03db32 into madox2:main Mar 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants