Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List corpuses #495

Merged
merged 7 commits into from
Aug 26, 2024
Merged

List corpuses #495

merged 7 commits into from
Aug 26, 2024

Conversation

blakerosenthal
Copy link
Contributor

Addresses #492

@blakerosenthal
Copy link
Contributor Author

@pmeier Is this about right? I like the idea of raising NotImplemented on source_storage.list_corpuses() in case the user decides to ignore corpuses altogether.

Also right now the /corpuses endpoint returns the list of corpus names directly from the source_storages. I have two questions about this:

  1. Should the api call distinguish which source storage the corpus name belongs to? E.g. ["LanceDB/<corpus_name>", "ChromaDB/<corpus_name>"]? Maybe actually returning a JSON of {"LanceDB": [<corpus_name>], "ChromaDB": [<corpus_name>]} would be better?
  2. For the built in source storages, the default corpus name is just the embedding ID. Maybe list_corpuses() should translate this to a prettier string like "default" for the sake of the UI.

Finally, should I add the actual UI changes as part of this issue, or will that be a separate issue?

@blakerosenthal blakerosenthal changed the title Ragna 492 List corpuses Aug 20, 2024
@pmeier
Copy link
Member

pmeier commented Aug 21, 2024

I like the idea of raising NotImplemented on source_storage.list_corpuses() in case the user decides to ignore corpuses altogether.

On second thought, we should probably raise a descriptive RagnaException instead. We need to tell users what they have to do if they encounter this error and then exit gracefully.

  1. Should the api call distinguish which source storage the corpus name belongs to? E.g. ["LanceDB/<corpus_name>", "ChromaDB/<corpus_name>"]? Maybe actually returning a JSON of {"LanceDB": [<corpus_name>], "ChromaDB": [<corpus_name>]} would be better?

You are right, that needs to happen. I would go for the JSON approach where the key is the .display_name().

2. For the built in source storages, the default corpus name is just the embedding ID. Maybe list_corpuses() should translate this to a prettier string like "default" for the sake of the UI.

Yes, that needs to happen on the source storage itself. I'm also inclined to remove the feature of using the embedding ID as name for now. There is no use and it makes it more complicated. So feel free to just remove the embedding ID and just use "default".

@pmeier
Copy link
Member

pmeier commented Aug 21, 2024

Finally, should I add the actual UI changes as part of this issue, or will that be a separate issue?

Up to you. When using two PRs we can merge this PR soon and don't have to wait for #484 to be merged.

Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. I'll review properly when the open questions you had are addressed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blakerosenthal I fixed the corpus support for the demo source storage in #498. Meaning, you can roll back all changes there besides the list_corpuses implementation.

@blakerosenthal
Copy link
Contributor Author

Up to you. When using two PRs we can merge this PR soon and don't have to wait for #484 to be merged.

Okay, let's keep them separate then

Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Blake, almost LGTM to me!

tests/deploy/api/test_e2e.py Outdated Show resolved Hide resolved
ragna/deploy/_api/core.py Outdated Show resolved Hide resolved
Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Blake. FYI: I pushed a small cleanup commit.

@pmeier pmeier merged commit cd2d10d into corpus-dev Aug 26, 2024
20 of 21 checks passed
@pmeier pmeier deleted the ragna-492 branch August 26, 2024 07:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

2 participants