Skip to content

Conversation

mnedelko
Copy link
Contributor

The fix includes specifying the 'datasets' core dependency to use the latest datasets version >=4.0.0.

This fixes an issue where pip accidentally resolves to a lower version due to a dependency-resolution chain that causes an unfavourable outcome which leads to ragas breaking at the import step.

Please test ragas comprehensively with this fix in place before merging.

…re dependency to be use the latest datasets version >=4.0.0. This resolvs an issue where pip accidentally resolves to a lower version due to a dependency in resolution chain that causes an unfavorable outcome leading to ragas breaking at the import step
@dosubot dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Aug 20, 2025
@anistark
Copy link
Contributor

Thanks @mnedelko for the PR 🙌🏼

Overall looks fine. Might need to fix the tests as well.

Try running: make test-e2e

  • E2E tests fail when trying to load
    load_dataset("explodinggradients/amnesty_qa", "english_v3")
  • This affects files:
    • tests/e2e/test_amnesty_in_ci.py
    • tests/e2e/test_fullflow.py
    • tests/benchmarks/benchmark_eval.py

@mnedelko mnedelko mentioned this pull request Aug 20, 2025
@mnedelko
Copy link
Contributor Author

mnedelko commented Aug 20, 2025

I was unable to run make teste-2e due to the below: Trying to run it in docker instead.

error: Distribution `torch==2.8.0 @ registry+https://pypi.org/simple` can't be installed because it doesn't have a source distribution or wheel for the current platform

hint: You're on macOS (`macosx_13_0_x86_64`), but `torch` (v2.8.0) only has wheels for the following platforms: `manylinux_2_28_aarch64`, `manylinux_2_28_x86_64`, `macosx_11_0_arm64`, `win_amd64`; consider adding your platform to `tool.uv.required-environments` to ensure uv resolves to a version with compatible wheels

Create a separate issue for this here: #2208

@mnedelko
Copy link
Contributor Author

mnedelko commented Aug 20, 2025

The reason the tests fail is because datasets>=4.0.0 removed support for scripts, which affects the load_dataset operation in the following files:

  1. tests/e2e/test_amnesty_in_ci.py - Line 18:
    amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v3")["eval"]
  2. tests/e2e/test_fullflow.py - Line 14:
    ds = load_dataset("explodinggradients/amnesty_qa", "english_v3")["eval"]
  3. tests/benchmarks/benchmark_eval.py - Line 19:
    ds = load_dataset("explodinggradients/amnesty_qa", "english_v2")

All three files are trying to load the "explodinggradients/amnesty_qa" dataset, which uses a custom Python script that's no longer supported in datasets>=4.0.0.

Additionally, there are many documentation files (notebooks and markdown) that also reference these datasets:

  • Various documentation notebooks also use load_dataset("explodinggradients/fiqa", ...)
  • Documentation also references load_dataset("explodinggradients/earning_report_summary", ...)

These datasets likely also use custom scripts and would fail with datasets>=4.0.0.

The recommended solution is as follows:

1. Update the Datasets on Hugging Face Hub

The dataset maintainers should migrate the datasets to the new format without Python scripts:

  • Convert explodinggradients/amnesty_qa to use Parquet/CSV/JSON files instead of amnesty_qa.py
  • Convert explodinggradients/fiqa similarly
  • This ensures compatibility with datasets>=4.0.0 and future versions

PS: There used to also be a second solution which allowed one to use trusted_remote to true but this option had also been removed from datasets for security reasons.

jjmachan pushed a commit that referenced this pull request Aug 26, 2025
…2222)

## Issue Link / Problem Description
<!-- Link to related issue or describe the problem this PR solves -->
- Fixes #2170 
- Derived from PR #2201 

## Changes Made
<!-- Describe what you changed and why -->
- Fixed e2e test suite compatibility with `datasets>=4.0.0`
- Resolved missing dependency issues (`unstructured` package)
- Handled missing keys in tests.
- formatting and type checks cleared

## Testing
<!-- Describe how this should be tested -->
### How to Test
- [x] Automated tests added/updated
- [x] Manual testing steps:
  1. `make run-ci`
  2. `make test`
  3. `make test-e2e`

---------

Co-authored-by: Mike Nedelko <[email protected]>
@anistark
Copy link
Contributor

Covered in PR #2222
includes the commits of this PR.

@anistark anistark closed this Aug 26, 2025
ahgraber pushed a commit to ahgraber/ragas that referenced this pull request Aug 26, 2025
…xplodinggradients#2222)

## Issue Link / Problem Description
<!-- Link to related issue or describe the problem this PR solves -->
- Fixes explodinggradients#2170 
- Derived from PR explodinggradients#2201 

## Changes Made
<!-- Describe what you changed and why -->
- Fixed e2e test suite compatibility with `datasets>=4.0.0`
- Resolved missing dependency issues (`unstructured` package)
- Handled missing keys in tests.
- formatting and type checks cleared

## Testing
<!-- Describe how this should be tested -->
### How to Test
- [x] Automated tests added/updated
- [x] Manual testing steps:
  1. `make run-ci`
  2. `make test`
  3. `make test-e2e`

---------

Co-authored-by: Mike Nedelko <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants