Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support chat history #438

Merged
merged 20 commits into from
Jul 11, 2024
Merged

support chat history #438

merged 20 commits into from
Jul 11, 2024

Conversation

blakerosenthal
Copy link
Contributor

This is a very rough first pass at #421. I decided to investigate the naive approach of replacing

class Assistant
    ...
    def answer(self, prompt: str, sources: list[Source]):
        ...

with

class Assistant
    ...
    def answer(self, messages: list[Message]):
        ...

for just the HTTP-like assistants as a POC. Since a Message instance can hold a list of Sources, the _messages list in Rag.Chat serves as a cache of raw messages that live in the database (and what the user sees), while each assistant is responsible for formatting the message list according to the needs of the individual LLM.

role=MessageRole.SYSTEM,
)

def _format_message_sources(self, messages: list[Message]) -> str:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of what this could look like for OpenAI-like assistants. There could also be an attribute on the Assistant that tells this function how to truncate the message list based on the size of the target LLM's context window.

@@ -195,6 +195,9 @@ async def prepare(self) -> Message:
await self._run(self.source_storage.store, self.documents)
self._prepared = True

system_prompt = self.assistant._make_system_content()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels easiest to include the basic system prompt here, but maybe it should be regenerated and prepended to the list of messages for every call to the LLM and not stored here. This would make the most sense if the Assistant might truncate the message list and chop off the system prompt.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the chat would need the system prompt of the assistant. Instead of creating it here and pass it to the Assistant as part of the messages, why not let the Assistant create it in the first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree. I also mostly talked myself out of putting it here as well. I think I wanted _messages to be a complete historical list of messages that could be sent to the LLM, but system prompts can stay hard coded on the Assistant (or better yet, in a config somewhere?)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just be sure there's no attempt at a 'universal' system prompt anywhere - on the assistant we'll have a general one for .answer(), but e.g. we'll want different system prompts for things like all the pre-process steps or anything that's outside the standard chain. Simply to say, it's reasonable to have a standard system prompt on .answer() but we don't want it fixed on the general llm calls in .generate() (wip).

ragna/core/_rag.py Outdated Show resolved Hide resolved
ragna/assistants/_demo.py Outdated Show resolved Hide resolved
ragna/assistants/_demo.py Outdated Show resolved Hide resolved
sources = await self._run(self.source_storage.retrieve, self.documents, prompt)

question = Message(content=prompt, role=MessageRole.USER, sources=sources)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Until this, we only ever attached the sources to the answer and not the question. I'm open to a "philosophical debate" on where they should go, but I don't want them on the question and the answer. Why not just keep passing the sources to Assistant.answer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was something I meant to ask about in my original list of questions last night but I ran out of steam. I'm really curious about this from a design perspective. Let's imagine a design where chat._messages is a list of Message objects, as currently implemented. Now also say the assistant is responsible for formatting each raw message for the LLM API call. I can imagine at least a couple scenarios for how we would format historical messages:

  1. The assistant reformats the message list every time it receives a new .answer call. So if we want to split up a prompt/sources pair into two separate json objects (i.e. {'content': '<user question>', 'role': 'user'} + {'content': 'please use only the following sources: ...', 'role': 'system'} we either need the sources to be attached to the question or we need to reach forward in the messages list to grab the sources from the answer. Something feels a little strange about the latter, but I agree that sources shouldn't be on both the question and the answer, just one or the other.
  2. Alternatively, rather than reformat the LLM message JSON every time, we only format new prompt/source pairs and maintain a list of message dictionaries or rendered json somewhere of previously formatted messages. The we'd just need to append the new messages and send them along. This then becomes another data structure to maintain.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recall, we eventually want ['chat history' messages]+prompt->preprocess->retrieval so the messages coming in the history before the current prompt are(were) all strings to the LLM, but we may want to be able to act upon them with the distinction between content and source - e.g. if messages from the user are tracked at a 'content' + sources level, then it might be easier to process the sources down in later stages - e.g. you may not want to carry content+N-chunks forward from every user prompt.

Keeping the sources attached to prompts would be particularly useful in preprocess tracking where a given user-role message may be (content+sources) and you want to streamline the context there (parse out/summarize/select sources).
Just keep in mind that 'previous' messages make sense to have the sources attached, while the 'current' prompt we'll still be sending through processing and retrieval.

If I'm thinking about a pre-process stage class, receiving
messages = [{system,...}, {user,.. content, sources}, {assistant,.. answer}...] makes more sense than the source being with the answer - the 'content' in all previous user messages would contain the sources in that string as well (minus current prompt, which would not yet have sources).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 'content' in all previous user messages would contain the sources in that string as well (minus current prompt, which would not yet have sources).

So you're thinking of the stage of the pipeline where we have the most recent user prompt but have not yet retrieved any sources from source storage, is that right? Would you want .content to contain string-formatted sources from previous questions, or would it be more flexible to still have them as Source objects?

ragna/core/_rag.py Outdated Show resolved Hide resolved
ragna/core/_components.py Outdated Show resolved Hide resolved
@@ -195,6 +195,9 @@ async def prepare(self) -> Message:
await self._run(self.source_storage.store, self.documents)
self._prepared = True

system_prompt = self.assistant._make_system_content()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why the chat would need the system prompt of the assistant. Instead of creating it here and pass it to the Assistant as part of the messages, why not let the Assistant create it in the first place?

)

def _format_message_sources(self, messages: list[Message]) -> str:
sources_prompt = "Include the following sources in your answer:"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on what order we try to merge PRs - this piece will eventually be overwritten by the notion of the preprocess stage. It's probably fine for this version imo.

@blakerosenthal
Copy link
Contributor Author

@dillonroach I haven't updated any assistants other than the ones that inherit from OpenaiLikeHttpApiAssistant and the demo. Do you anticipate [m]any merge issues with your work if I went ahead and did that / do you have any other thoughts on how to pass multiple messages to the other assistants? I was thinking it would be a good idea to have a message list length limit but you are probably the one to know what that should be.

@dillonroach
Copy link

@dillonroach I haven't updated any assistants other than the ones that inherit from OpenaiLikeHttpApiAssistant and the demo. Do you anticipate [m]any merge issues with your work if I went ahead and did that / do you have any other thoughts on how to pass multiple messages to the other assistants?

I would intentionally stay hands-off from your PR on the assistants, I'll be overhauling that anyway (a reference implement in one or two places doesn't hurt anything, but I'll likely just use it as ref and tear it up, so don't be too worried about details).

I was thinking it would be a good idea to have a message list length limit but you are probably the one to know what that should be.

That's on the assistants to handle and there's nothing motivating it beyond the context window of the LLM - each assistant should have an idea of a max sequence length they can handle, and if the length coming in is too long, they'll basically need to drop the first n messages to account for the issue. E.g. (assuming message 1 is system prompt) (pseudo) messages[1] + messages[1+n:] - ideally, here, you'd do a token-count for the messages and anticipate the max length your generation will take, such that all messages+upcoming gen can fit within the llm context window.

@blakerosenthal
Copy link
Contributor Author

I would intentionally stay hands-off from your PR on the assistants

@dillonroach Do you want to just merge this branch into yours and make a mega-PR, or would you prefer I coordinate with Philip to get mine merged into main first?

@blakerosenthal blakerosenthal marked this pull request as ready for review July 10, 2024 21:35
@pmeier pmeier changed the title WIP: chat history support chat history Jul 11, 2024
Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Blake! I pushed a minor cleanup commit (d7fbccb) to avoid some back and forth.



@skip_on_windows
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the issue here? We shouldn't just skip our E2E test unless we are sure that this is testing issue on Windows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failure looks real and serious:

Differing items:
  {'role': 'user'} != {'role': 'assistant'}
  {'id': 'a5a9ab44-2571-4dd1-889c-1ea8c61e6072'} != {'id': '0de4bf10-7e1c-4e54-8b8d-2aa760868335'}
  {'content': '?'} != {'content': "I'm a demo assistant and can be used to try Ragna's workflow.\nI will only mirror back my inputs. \n\nSo far I have received 1 messages.\n\nYour last prompt was:\n\n> ?\n\nThese are the sources I was given:\n\n- test.txt: ! "}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blakerosenthal I'm going to merge this PR to unblock @dillonroach. Could you please get on debugging this issue next?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, there's something funky going on with transaction processing on Windows but it's intermittent (sometimes the GH tests pass and sometimes they don't). I'll try to get to the bottom of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #447

@pmeier pmeier merged commit 55f7fc5 into Quansight:main Jul 11, 2024
20 of 21 checks passed
blakerosenthal added a commit that referenced this pull request Jul 17, 2024
Co-authored-by: Philip Meier <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants