Skip to content

Conversation

@aryopg
Copy link
Contributor

@aryopg aryopg commented Apr 24, 2025

Summary

Reincorporating the edits made in #75 to allow prefilling deepseek models.

This was not handled in the latest update and if I were to run a request with a prefilling, the code would throw an error asking to use is_prefix=True. Once I set that parameter, the code would ask me to use the beta url.

Changes Introduced

  • safetytooling/apis/inference/openai/chat.py
    • If the model is either deepseek-reasoner or deepseek-chat, use the prompt.deepseek_format function to prepare the prompt.
    • If the last message is by the assistant, use deepseek's beta url. (Note that the overriding is very hacky at the moment, I'm open to any suggestions on how to handle this better)

@aryopg aryopg marked this pull request as ready for review April 24, 2025 23:12
@aryopg aryopg requested review from jplhughes and kxcloud April 25, 2025 08:26
@aryopg aryopg self-assigned this Apr 25, 2025
api_func = self.aclient.chat.completions.create
if model_id in {"deepseek-chat", "deepseek-reasoner"}:
if prompt.is_last_message_assistant():
self.aclient.base_url = "https://api.deepseek.com/beta"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should you have an else here to swap back to the other version or swap back to non beta after the call is successful or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Added in 71fe0d4


original_base_url = self.aclient.base_url
try:
if model_id in {"deepseek-chat", "deepseek-reasoner"}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a DEEPSEEK_MODELS dict somewhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or list)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have one in safety-tooling/safetytooling/apis/inference/api.py, but that would result in a circular import. We can create a constants file somewhere maybe?

Comment on lines +139 to +141
finally:
# Always revert the base_url after the call
self.aclient.base_url = original_base_url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thought - could this have strange async race conditions :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, maybe the base url should be passed directly to the api_func on a call-wise basis (rather than setting it as an attribute of the entire class). Since the class itself could be handling many requests with different models (and even different providers if it was set up differently).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's true, should we have an asyncio lock maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base url should be passed directly to the api_func on a call-wise basis

The api_func doesn't accept base url unfortunately. And i guess locking would harm concurrency..
Another (naive) approach is to instantiate the api_func again and again (instantiate openai.AsyncClient).

Would you be against locking?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think locking would mess things up in terms of throughput since it would lock until the async call is complete which would be bad. Perhaps you can override the URL via "extra_headers"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a little worried about this. Can we just use "https://api.deepseek.com/beta" always? Then we can set in api.py and remove all this logic internally of needing to swap between

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what ended up happening here?

@aryopg aryopg marked this pull request as draft April 25, 2025 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants