forked from huggingface/datasets
-
Notifications
You must be signed in to change notification settings - Fork 0
feat(bids): Add BIDS dataset loader for neuroimaging data #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
The-Obstacle-Is-The-Way
wants to merge
52
commits into
CloseChoice:main
Choose a base branch
from
The-Obstacle-Is-The-Way:feat/bids-loader
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
feat(bids): Add BIDS dataset loader for neuroimaging data #1
The-Obstacle-Is-The-Way
wants to merge
52
commits into
CloseChoice:main
from
The-Obstacle-Is-The-Way:feat/bids-loader
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
parquet scan options and docs
* more parquet stream arg docs * minor * minor
less api calls when resolving data_files
set dev version
This commit will be squashed.
* fix polars cast_column issue * remove debug statements * cast large_strings to string for image handling
allow streaming hdf5 files
retry open hf file
* keep hffs cache in workers when streaming * bonus: reorder hffs args to improve caching
* Update document_dataset.mdx * Update document_dataset.mdx OCR
* Add custom suffix support to from_generator * Renamed a new arg to fingerprint * Changed name to config_id in builder * Change version * Added a test * Version update * Update version * Update tests/test_arrow_dataset.py * Rename config_id to fingerprint in generator.py * Apply suggestions from code review * Update src/datasets/io/generator.py * Apply suggestions from code review --------- Co-authored-by: Quentin Lhoest <[email protected]>
* Add nifti support * update docs * update nifti after testing locally and from remote hub * update setup.py to add nibabel and update docs * add nifti_dataset * fix nifti dataset documentation * add nibabel to test dependency * Add section for creating a medical imaging dataset --------- Co-authored-by: Quentin Lhoest <[email protected]>
* WIP: shuffle working, interleave_ds not yet * remove debug statements * add test * update test * use recursive overwriting of generator seeds * update test description * remove debugging strings * return instances of baseexiterable instead of modifying inplace * add test to make sure multiple iterations over data are deterministic
* fix ci compressionfs * again * style
* update signature for _batch_setitems * arguments passthrough
…ggingface#7833) Adressing issue 7832
…huggingface#7831) * Fix argument passing in stratified shuffle split NumPy 2.0 changed the behavior of the `copy=False` parameter to be stricter. When `train_test_split` converted Arrow arrays to NumPy format for stratification, it triggered this error for non-contiguous arrays. Using `np.asarray()` allows copying when necessary, which is the recommended migration path per NumPy 2.0 documentation. * make style --------- Co-authored-by: Quentin Lhoest <[email protected]>
* add 3.14 * update ci * go home tf * torchcodec * numba * fix ci * no lz4 in python 3.14 * fix tests * again * again * again
* WIP: add audio, tests failing * WIP: add mono argument, tests failing * change from mono to num_channels in documentation, audio tests passing * update docs and move test for audio * update audio * update docstring for audio * Apply suggestions from code review --------- Co-authored-by: Quentin Lhoest <[email protected]>
fsspec 2025.10.0
release: 4.4.0
better streaming retries
…gingface#7848) remove mode parameter in docstring of pdf and video feature
* WIP: allow uploading of nifti * remove debug statements and fix test * remove debug statements * remove debug statements
Change arxiv to hg papers
* fix some broken links * some more --------- Co-authored-by: Quentin Lhoest <[email protected]>
* WIP: nifti vis working, now improve * seems to work fine, tests not there yet * remove uncommented lines
* try latest papaya * try niivue * update repr_html for nifti to work better with niivue * remove papaya files * remove papaya from setup.py * use ipyniivue * update nifti feature to use ipyniivue * add 3d crosshair for orientation * remove docstring
- Remove deprecated `trust_remote_code=True` from tests (not needed for packaged modules) - Fix ruff linting errors (import sorting, trailing newlines) - Apply ruff formatter for consistent code style - Convert set() generators to set comprehensions (C401)
- Update setup.py to include nibabel in BIDS extra - Update docs to clarify nibabel is included - Add nibabel availability check in _info() - Move os import to module level - Update test skipif to check both pybids and nibabel
Author
|
Apologies for the accidental close - was cleaning up branches on my fork and it auto-closed linked PRs. Note: If you sync with the latest upstream main, the code footprint should be smaller since your NiiVue visualization work (huggingface#7878) is now merged. Happy to rebase if helpful! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Hey Tobias! 👋
Following up on your neuroimaging initiative in Discord - I implemented a BIDS dataset loader that builds on your Nifti feature work.
This enables
load_dataset('bids', data_dir='/path/to/bids')for neuroimaging researchers.Changes
src/datasets/packaged_modules/bids/- BIDS loader using PyBIDS + your Nifti featureUsage
Testing
make qualitypassesAlso opened upstream PR: huggingface#7886
Let me know if you'd like any changes! 🧠