Skip to content

Conversation

@Diya910
Copy link

@Diya910 Diya910 commented Apr 13, 2025

Before submitting a pull request (PR), please read the contributing guide.

Please fill out as much of this template as you can, but if you have any problems or questions, just leave a comment and we will help out :)

Description

What is this PR

  • Bug fix
  • [ yes] Addition of a new feature
  • Other

Why is this PR needed?
This PR introduces a new @Dateto@ wildcard that enables users to search for folders based on a date range embedded in their names. This feature is especially useful when users want to transfer data recorded within a specific date range, without needing to create folders for every date in that range.

What does this PR do?
Implements @Dateto@ pattern recognition inside search_for_wildcards.

Uses get_values_from_bids_formatted_name to extract date-YYYYMMDD from folder names.

Filters the folders based on whether the date falls within the provided range.

References

#508

How has this PR been tested?

Created automated tests (test_date_search_range) using a simulated folder structure with date-YYYYMMDD format.

Verified that only folders within the specified date range are returned.

Confirmed that existing wildcard functionality remains unaffected.

Is this a breaking change?

No, this feature is additive and does not alter existing behavior.

Does this PR require an update to the documentation?

Yes. The documentation should be updated to mention the new @Dateto@ wildcard and its usage.

If any features have changed, or have been added. Please explain how the
documentation has been updated.

Checklist:

  • [ yes] The code has been tested locally
  • [ yes] Tests have been added to cover all new functionality
  • The documentation has been updated to reflect any changes
  • The code has been formatted with pre-commit

There are two minor mypy errors I couldn't fully resolve:
A type conflict involving the dummy Configs class used in tests — guidance from maintainers would help finalize this.
A type mismatch originating from an existing code path — this appears unrelated to the new functionality added.

@Diya910
Copy link
Author

Diya910 commented Apr 17, 2025

@adamltyson @JoeZiminski
Is there any update on the pull request. Your feedback will be really helpful.

@sumana-2705
Copy link
Contributor

Hello @Diya910,
The changes are looking great. The review process might be a little delayed since the team is currently a bit busy. In the meantime, it might be a good idea to take a look at the documentation as well, in the transfer_data.md file :)

@JoeZiminski
Copy link
Member

Hi @Diya910 so sorry for the delay in response! thanks a lot for this PR and the extensive tests. I'm still not back full time but will definitely have time to review this within the next two weeks. Thanks for your patience

Copy link
Member

@JoeZiminski JoeZiminski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Diya910 thanks a lot for this, its a really nice implementation and is exactly what we need to do in this case. I have left a few comments on refactoring, this is because the introduced functionality can be aligned with some existing code to reduce duplication across the codebase. This requires some massaging of existing datashuttle code to make it a little more general so it can be called here. The suggestions also extend the implementation to handle the TIMETO and DATETIMETO case. For now I have not reviewed the tests as they might need changing after the refactor, but in general they look good and the attention to detail on testing is much appreciated.

Let me know if anything is not clear and if you have any questions or alternative ways to tackle this. Refactorings like those suggested can be a little fiddly. The linting / type checking will be useful when performing such refactorings. Of course, I'm happy to help wherever it would be useful. Thanks again for this contribution!

Just a reminder to myself, we will also need to add documentation for this new functionality.

if canonical_tags.tags("*") in name or "@DATETO@" in name:
search_str = name.replace(canonical_tags.tags("*"), "*")
# If a date-range tag is present, extract dates and update the search string.
if "@DATETO@" in name:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a canonical tags.tags() function that contains all the tags (just in case we change them or some other problem that requires their editing arises). So @DATETO@, @TIMETO@ and @DATETIMETO@ could be added to that function and here @DATETO@ replaced with tags.tags("DATETO")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added DATETO, TIMETO, and DATETIMETO to canonical_tags.py and using them through tags()

@JoeZiminski
Copy link
Member

JoeZiminski commented Jun 6, 2025

Hey @Diya910 do you think you would be interested in continuing to work on this PR? This is a great addition and it would be nice to release it in a version soon. I'm happy to finalise the PR as most of the work now is just refactoring into the existing codebase.

@Diya910
Copy link
Author

Diya910 commented Jun 6, 2025

Hey @Diya910 do you think you would be interested in continuing to work on this PR? This is a great addition and it would be nice to release it in a version soon. I'm happy to finalise the PR as most of the work now is just refactoring into the existing codebase.

Yes yes, I am interested. I was busy with my exams and other stuffs. Just allow me a day or two. I'll do the required changes suggested by you.

@JoeZiminski
Copy link
Member

Hey @Diya910 great! No rush BTW I was just checking in, please prioritise exams / other stuff / taking some time to recuperate after exams. I was thinking it might be nice to merge over the next few weeks (rather than next few days), thanks!

@Diya910
Copy link
Author

Diya910 commented Jun 6, 2025

Thanks, I'll try to work on it as soon as possible.

…ion of code my making functions in validation.py and using in search_with_tags feature in folders file
@Diya910
Copy link
Author

Diya910 commented Jun 15, 2025

Hey, @JoeZiminski I have probably done all the changes suggested by you and also centralized the code. I have also changed the test file with additional test functions, everything is working fine from side. If any other changes are required, please let me know. I will do them at the earliest.

@JoeZiminski JoeZiminski added this to the v2.8.0 milestone Jun 17, 2025
@JoeZiminski
Copy link
Member

Hi @Diya910 thanks a lot for this! Will review tomorrow

Copy link
Member

@JoeZiminski JoeZiminski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Diya910 thanks for this, this is really great stuff. The code is very clean, this is going to make a great feature. I have left a few comments on the code, they just suggest some minor refactoring's to reduce code duplication where possible. For critical code, it makes sense to define the key parts only in once place, just in case they are changed later but the editor forgets to check for all places they are defined.

The tests are great for ensuring the features works well, I have suggested a refactoring here to use our existing testing machinery which I think should reduce some boilerplate, let me know if you have any questions about this. The tests will should probably test all three cases, dateto, timeto and datetimteto, happy to help with this.

I just pushed some fixes to the pre-commit on the CI which was failing, just some minor typing issues (see here for some detail on the pre-commit hooks). This should move on to the full test suite now.

Thanks again Diya this is nearly done! I just remembered we will also need to document this change, the contributing guide for this is here. It would make sense to add the new tags to this section. Happy to do this because the documentation can be a bit fiddly, but if you are interested in this please feel free to go ahead, let me know if you have any questions!

@JoeZiminski JoeZiminski linked an issue Jun 21, 2025 that may be closed by this pull request
@Diya910
Copy link
Author

Diya910 commented Jul 2, 2025

Hey @JoeZiminski, I have done changes required by you. These were a lot of changes I am not able to reply to all of them individually. But I made sure to make changes suggested by you. I have tested the changes on draft test file and they are working fine. I haven't properly done work on test file. It was a lot for me to do in a go. Once you confirm these changes I'll move ahead in refactoring test file. I hope you are fine with it. If I missed any suggestion above just in case, please point out to that I'll make those changes.

Copy link
Member

@JoeZiminski JoeZiminski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Diya910 this is great, definitely good to go bar some very minor suggestions. Most of these are minor github code suggestions so you can directly commit them.

Apologies, one of my suggestions was actually worse than what was already there 😅 around the walrus operator. Sorry for the inconvenience of having to revert this.

After these changes are integrated I will message @Akseli-Ilmanen to test this manually while the other tests are been written. Let me know if you have any questions as you refactor the tests. Thanks again!

@Diya910
Copy link
Author

Diya910 commented Jul 4, 2025

@JoeZiminski I have done all the changes. Please have a look. I am not sure about if I have removed declarations the right way. Please let me know if you want me to change docstrings in any specific way. Thankyou

@JoeZiminski
Copy link
Member

JoeZiminski commented Oct 30, 2025

Hey @Akseli-Ilmanen thanks for these suggestions, on this PR it should now be possible to do things like:

from datashuttle import DataShuttle

project = DataShuttle("my_project")

project.create_folders("rawdata", "sub-@DATE@", "ses-@DATETIME@", ["behav", "ephys"], allow_letters_in_sub_ses_values=True)

# collect some data

project.upload_custom("rawdata", "all", "ses-20251030T134534@DATETIMETO@20251030T134544", "all")

And equivalently this can be done through the TUI. Note that for the transfer to work, every ses- must have the same format (e.g. all must be ses-<a datetime> or ses-<a date> or ses-<a time> but not a mix (similar for sub-).

Would be great to hear how this works for you!

You can install from this PR by doing:

pip uninstall datashuttle
git clone [email protected]:Diya910/datashuttle.git
cd datashuttle
git checkout date_feature
git pull
pip install -e .

@JoeZiminski JoeZiminski removed this from the v2.8.0 milestone Nov 1, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements datetime range filtering functionality for datashuttle, allowing users to select folders based on date, time, or datetime ranges. The feature introduces three new wildcard tags (@Dateto@, @TimeTo@, @DATETIMETO@) and refactors datetime handling throughout the codebase for improved consistency and maintainability.

Key changes:

  • Added datetime range search functionality via new @Dateto@, @TimeTo@, and @DATETIMETO@ tags
  • Refactored datetime formatting to separate value generation from key-prefixing (e.g., format_datetime vs format_datetime_with_key)
  • Improved regex patterns for datetime validation using more concise notation (\d{8} vs \d\d\d\d\d\d\d\d)

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
tests/test_date_search_range.py New comprehensive test suite for datetime range search functionality covering date, time, and datetime ranges with edge cases
tests/tests_integration/test_validation.py Added integration tests for datetime tag validation after sub/ses keys
tests/tests_unit/test_validation_unit.py Updated unit tests to reflect more concise regex notation for datetime patterns
datashuttle/configs/canonical_tags.py Added new datetime range tags and centralized datetime format definitions
datashuttle/utils/validation.py Refactored datetime validation to use centralized format definitions and extracted ISO format validation
datashuttle/utils/formatting.py Refactored datetime formatting to support both keyed and non-keyed formats, handling sub-/ses- prefixed datetime values
datashuttle/utils/folders.py Implemented core datetime range search logic with filtering, validation, and glob pattern generation; renamed search_for_wildcards to search_with_tags
datashuttle/utils/data_transfer.py Updated function call to use renamed search_with_tags function
pyproject.toml Changed mypy configuration to use overrides syntax for ignoring test errors

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

assert sorted(transferred_sessions) == sorted(expected_sessions)

def test_date_as_sub_or_ses_value(self, project):
""" """
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for test method. This test verifies that date values can be used directly in subject or session names (without the "date-" prefix). Add a descriptive docstring like: "Test date range filtering when dates are used directly as sub/ses values (e.g., ses-20240301)."

Suggested change
""" """
"""Test date range filtering when dates are used directly as sub/ses values (e.g., ses-20240301)."""

Copilot uses AI. Check for mistakes.
) -> None:
"""Replace tags with their final value for every name in a list.
@DATE@, @TIME@ and @DATETIME@ keys can be positioed directly
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "positioed" should be "positioned".

Suggested change
@DATE@, @TIME@ and @DATETIME@ keys can be positioed directly
@DATE@, @TIME@ and @DATETIME@ keys can be positioned directly

Copilot uses AI. Check for mistakes.

def find_datetime_in_name(
name: str, format_type: str, tag: str
) -> tuple[str | Any, ...] | None:
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent return type annotation. The function signature declares -> tuple[str | Any, ...] | None but the docstring says tuple[str, str] | None. The actual return type should be tuple[str, str] | None since match.groups() from a pattern with two capture groups returns a tuple of two strings. Update the function signature to -> tuple[str, str] | None.

Suggested change
) -> tuple[str | Any, ...] | None:
) -> tuple[str, str] | None:

Copilot uses AI. Check for mistakes.
def test_without_wildcard_ses(self, project):
"""Test without wildcard ses.
Including @*@ only led to an uncaught but as it was triggering a
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "but" should be "bug". The comment should read "Including @*@ only led to an uncaught bug..."

Suggested change
Including @*@ only led to an uncaught but as it was triggering a
Including @*@ only led to an uncaught bug as it was triggering a

Copilot uses AI. Check for mistakes.
assert sorted(transferred_sessions) == sorted(expected_sessions)

def test_time_as_sub_or_ses_value(self, project):
""" """
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for test method. Add a descriptive docstring like: "Test time range filtering when times are used directly as sub/ses values (e.g., ses-110101)."

Suggested change
""" """
"""Test time range filtering when times are used directly as sub/ses values (e.g., ses-110101)."""

Copilot uses AI. Check for mistakes.
assert sorted(transferred_sessions) == sorted(expected_sessions)

def test_datetime_as_sub_or_ses_value(self, project):
""" """
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for test method. Add a descriptive docstring like: "Test datetime range filtering when datetimes are used directly as sub/ses values (e.g., ses-20240301T110101)."

Suggested change
""" """
"""Test datetime range filtering when datetimes are used directly as sub/ses values (e.g., ses-20240301T110101)."""

Copilot uses AI. Check for mistakes.
def run_session_upload(
self, project, subs, sessions, session_search_string
):
""""""
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing docstring for helper method. Add a descriptive docstring like: "Helper method to create test folders and upload sessions with specified search criteria. Returns the list of transferred session names."

Suggested change
""""""
"""
Helper method to create test folders and upload sessions with specified search criteria.
Returns the list of transferred session names.
"""

Copilot uses AI. Check for mistakes.

if already_has_wildcard_at_end:
# Handle edge case where @*@ tag is immediately after @DATETIMETO@
# or similar tag. This results in "datetime-**" which cases errors.
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "cases" should be "causes". The comment should read "This results in 'datetime-**' which causes errors."

Suggested change
# or similar tag. This results in "datetime-**" which cases errors.
# or similar tag. This results in "datetime-**" which causes errors.

Copilot uses AI. Check for mistakes.
format_to_check = utils.get_values_from_bids_formatted_name(
[name], key, return_as_int=False
)[0]
except:
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a bare except clause is a bad practice as it catches all exceptions, including SystemExit and KeyboardInterrupt. Replace with a specific exception type, such as except (KeyError, IndexError): to catch expected exceptions when the key is not found in the name.

Suggested change
except:
except (KeyError, IndexError, ValueError):

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support @DATE@, @TIME@ and @DATETIME@ to insert values after sub- and ses- values Search within date range

4 participants