Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Added continuation support for Amazon Bedrock #675

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

JonahSussman
Copy link
Contributor

@JonahSussman JonahSussman commented Feb 20, 2025

Closes #136
Closes #350

I refactored model_provider.py, putting each ModelProvider into its own class. This will allow you to override certain model-specific functionality. For example, a custom LLM invoke for Bedrock (the main reason I refactored)

Bedrock will now continue to generate tokens even if max_tokens is reached. Look at the following Python script and attached logs:

from langchain.globals import set_debug, set_verbose

from kai.kai_config import KaiConfigModels, SupportedModelProviders
from kai.llm_interfacing.model_provider import ModelProvider

set_verbose(True)
set_debug(True)

m = ModelProvider.from_config(
    config=KaiConfigModels(
        provider=SupportedModelProviders.CHAT_BEDROCK,
        args={"model_id": "us.anthropic.claude-3-5-sonnet-20241022-v2:0"},
    )
)

print(m.invoke("Generate a long poem. 10,000 words. It must be very long.").content)

tester.log

Ugly langchain color codes aside, you can see how we do two requests now, resulting in one continuous poem at the end.

I tested locally with Amazon Bedrock and OpenAI, but I would like some assistance testing other providers to make sure I didn't mess anything else up.

Separately, we should probably think about how to integrate streaming into the project. It should be fairly straightforward now with the Chatter class. We can create a message with a separate messageToken and update it as we get more chunks. WDYT?

@JonahSussman JonahSussman changed the title ✨ Preliminary continuation support for Amazon Bedrock ✨ Added continuation support for Amazon Bedrock Feb 20, 2025
Copy link
Contributor

@fabianvf fabianvf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable. Does using a model that requires continuation break the caching?

case SupportedModelProviders.CHAT_DEEP_SEEK:
return ModelProviderChatDeepSeek(config, demo_mode, cache)
case _:
assert_never(config.provider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a typed exception rather than an assertion, unless it actually can't be reached. I'm assuming we can trigger this with bad configuration though right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new thing I recently found out about type checking in Python. If a type checker thinks that something can't happen, it will be assigned the type Never. This allows you to do exhaustiveness checking and get a static error if you don't handle every case.

We shouldn't ever have an issue with config because the Pydantic model makes sure it's one of the enum values.

@JonahSussman
Copy link
Contributor Author

@fabianvf It shouldn't since we cache BaseMessages, and invike_llm returns one of those

@jwmatthews
Copy link
Member

jwmatthews commented Feb 20, 2025

Looking great with claude via bedrock

Tested with

  AmazonBedrock: &active
    provider: "ChatBedrock"
    args:
      model_id: "us.anthropic.claude-3-5-sonnet-20241022-v2:0"

I confirmed that with an older binary against claude I was seeing a partial output stop with
Stateful EJBs can be converted to a CDI bean by replacing the @Stateful annotation with a bean-defining annotation that encompasses the appropriate scope (e.g., @ApplicationScoped). for ShoppingCartService.java
Screenshot 2025-02-20 at 1 14 42 PM

With this PR I see the expected full contents of the fix:

Screenshot 2025-02-20 at 1 39 41 PM

PR looks to be performing as we expected

$ grep "Continuing..." *
kai-rpc-server.log:INFO - 2025-02-20 13:37:06,527 - kai.kai.llm_interfacing.model_provider - Thread-1 - [model_provider.py:363 - invoke_llm()] - Message did not fit in max tokens. Continuing...
kai-rpc-server.log:INFO - 2025-02-20 13:37:52,221 - kai.kai.llm_interfacing.model_provider - Thread-1 - [model_provider.py:363 - invoke_llm()] - Message did not fit in max tokens. Continuing...
kai-rpc-server.log:INFO - 2025-02-20 13:44:17,735 - kai.kai.llm_interfacing.model_provider - Thread-1 - [model_provider.py:363 - invoke_llm()] - Message did not fit in max tokens. Continuing...


@jwmatthews
Copy link
Member

I did hit. An error occurred (ValidationException) when calling the InvokeModel operation: messages: final assistant content cannot end with trailing whitespace

Logs: https://gist.githubusercontent.com/jwmatthews/43400716b53f0df30c558b2081f4d939/raw/560ea0805957260ef1db6e530487aa5a82bc3443/kai-rpc-server.log

Saw this error when processing: "Stateless EJBs can be converted to a CDI bean by replacing the @stateless annotation with a scope eg @ApplicationScoped" at Medium Effort

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants