-
Notifications
You must be signed in to change notification settings - Fork 1k
Closes Issue #2170: Address issue 2170 by specifying 'datasets' core dependency version to >=4.0.0 #2201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…re dependency to be use the latest datasets version >=4.0.0. This resolvs an issue where pip accidentally resolves to a lower version due to a dependency in resolution chain that causes an unfavorable outcome leading to ragas breaking at the import step
Thanks @mnedelko for the PR 🙌🏼 Overall looks fine. Might need to fix the tests as well. Try running:
|
I was unable to run make teste-2e due to the below: Trying to run it in docker instead.
Create a separate issue for this here: #2208 |
The reason the tests fail is because datasets>=4.0.0 removed support for scripts, which affects the load_dataset operation in the following files:
All three files are trying to load the "explodinggradients/amnesty_qa" dataset, which uses a custom Python script that's no longer supported in datasets>=4.0.0. Additionally, there are many documentation files (notebooks and markdown) that also reference these datasets:
These datasets likely also use custom scripts and would fail with datasets>=4.0.0. The recommended solution is as follows: 1. Update the Datasets on Hugging Face Hub The dataset maintainers should migrate the datasets to the new format without Python scripts:
PS: There used to also be a second solution which allowed one to use trusted_remote to true but this option had also been removed from datasets for security reasons. |
…2222) ## Issue Link / Problem Description <!-- Link to related issue or describe the problem this PR solves --> - Fixes #2170 - Derived from PR #2201 ## Changes Made <!-- Describe what you changed and why --> - Fixed e2e test suite compatibility with `datasets>=4.0.0` - Resolved missing dependency issues (`unstructured` package) - Handled missing keys in tests. - formatting and type checks cleared ## Testing <!-- Describe how this should be tested --> ### How to Test - [x] Automated tests added/updated - [x] Manual testing steps: 1. `make run-ci` 2. `make test` 3. `make test-e2e` --------- Co-authored-by: Mike Nedelko <[email protected]>
Covered in PR #2222 |
…xplodinggradients#2222) ## Issue Link / Problem Description <!-- Link to related issue or describe the problem this PR solves --> - Fixes explodinggradients#2170 - Derived from PR explodinggradients#2201 ## Changes Made <!-- Describe what you changed and why --> - Fixed e2e test suite compatibility with `datasets>=4.0.0` - Resolved missing dependency issues (`unstructured` package) - Handled missing keys in tests. - formatting and type checks cleared ## Testing <!-- Describe how this should be tested --> ### How to Test - [x] Automated tests added/updated - [x] Manual testing steps: 1. `make run-ci` 2. `make test` 3. `make test-e2e` --------- Co-authored-by: Mike Nedelko <[email protected]>
The fix includes specifying the 'datasets' core dependency to use the latest datasets version >=4.0.0.
This fixes an issue where pip accidentally resolves to a lower version due to a dependency-resolution chain that causes an unfavourable outcome which leads to ragas breaking at the import step.
Please test ragas comprehensively with this fix in place before merging.