-
Notifications
You must be signed in to change notification settings - Fork 1k
Derek/feat notion backend #2148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Derek Anderson <[email protected]>
Signed-off-by: Derek Anderson <[email protected]>
Signed-off-by: Derek Anderson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR introduces a new Notion backend for the Ragas experimental module that enables teams to store and collaborate on evaluation datasets using Notion databases. The implementation follows the existing backend architecture pattern, providing seamless integration with the current Dataset
API while adding team collaboration capabilities.
The change adds several key components:
Backend Implementation: The NotionBackend
class implements the BaseBackend
interface, providing CRUD operations for datasets and experiments stored in Notion databases. It handles data conversion between Python objects and Notion's property format, including JSON serialization for complex data structures and automatic content truncation to meet Notion's 2000-character limits.
Plugin Integration: The backend is registered as an optional plugin through entry points in pyproject.toml
, allowing automatic discovery when the notion-client
dependency is installed. The implementation uses graceful degradation - the backend is only available when dependencies are present, maintaining backward compatibility.
User Experience: The PR includes comprehensive documentation and examples showing a "local-first" approach where users can start with local backends (LocalJSONLBackend
) and migrate to Notion when ready for team collaboration. This reduces onboarding friction while providing a clear upgrade path.
Data Management: The backend implements an "archive-and-recreate" strategy for saves, where existing data is archived before new data is created. This ensures data consistency but differs from typical update patterns. The system automatically handles database schema validation and provides rich error messaging for common configuration issues.
The integration maintains the same API across all backends, so users can switch from LocalJSONLBackend()
to NotionBackend()
without changing their data manipulation code. This aligns with the experimental module's goal of providing extensible, pluggable backends for different collaboration needs.
Confidence score: 3/5
• This PR has some implementation risks that could cause data loss or confusion in production use
• The archive-and-recreate save strategy could lose data if creation fails after archiving, and silent content truncation may cause unexpected data loss
• Files needing attention: experimental/ragas_experimental/backends/notion.py
(data loss risks in save method), experimental/docs/notion_backend.md
(code accuracy issues)
9 files reviewed, 6 comments
Signed-off-by: Derek Anderson <[email protected]>
Signed-off-by: Derek Anderson <[email protected]>
…d documentation Signed-off-by: Derek Anderson <[email protected]>
Signed-off-by: Derek Anderson <[email protected]>
…and handling availability Signed-off-by: Derek Anderson <[email protected]>
…on and data structure Signed-off-by: Derek Anderson <[email protected]>
…tion API calls Signed-off-by: Derek Anderson <[email protected]>
Signed-off-by: Derek Anderson <[email protected]>
…nt availability Signed-off-by: Derek Anderson <[email protected]>
Signed-off-by: Derek Anderson <[email protected]>
…ration tests Signed-off-by: Derek Anderson <[email protected]>
Signed-off-by: Derek Anderson <[email protected]>
…aceEmbeddings Signed-off-by: Derek Anderson <[email protected]>
content = str(value) | ||
# Notion rich text has a limit, truncate if needed | ||
if len(content) > 2000: | ||
content = content[:1997] + "..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should validate the content value and check what's getting truncated. It might turn into something like broken json or something.
Alternatively, can also store an additional flag or metadata like truncated: true
. Logs are easy to miss.
} | ||
|
||
try: | ||
response = self.client.databases.query( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
databases.query
returns max 100 results per call I believe.
Probably can add a loop check using has_more
.
Also, can add a field/param like total_results
.
else: | ||
result[prop_name] = text | ||
elif prop_type == "date": | ||
date_data = prop_data.get("date") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to parse date in case of inconsistent formats.
existing_pages = self._query_database(data_type, name) | ||
for page in existing_pages: | ||
try: | ||
self.client.pages.update(page_id=page["id"], archived=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to implement rate limiting here.
notion_properties = self._convert_to_notion_properties(properties) | ||
|
||
try: | ||
self.client.pages.create( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try rate limiting here as well.
try: | ||
result[prop_name] = json.loads(text) | ||
except json.JSONDecodeError: | ||
result[prop_name] = text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can add a log here as why the json decoding failed?
Unless this is intentional and we're expecting this.
This pull request introduces a new Notion backend to the Ragas experimental module, enabling users to store evaluation datasets in Notion databases for team collaboration.
It includes documentation, examples, and necessary code changes to integrate and support this backend.
There are a few more limitations for what the code can do when starting the dataset, the table for example needs to be defined in notion.
This is currently appending new rows, but maybe it should clear the whole data set first?