Skip to content

Conversation

grg2rsr
Copy link
Contributor

@grg2rsr grg2rsr commented Jul 15, 2025

… possible by docstring), no flag files were found

@grg2rsr grg2rsr requested a review from oliche July 15, 2025 10:36
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check with @k1o0 but I think the original intent was to be able to operate within a session path or at a level containing multiple session paths using the same glob pattern, while providing some validation.
Do you have more info about your failure case ? It would be nice to fix it while maintaining compatibility with the 2 requirements above !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed Georg's change here would break new session registration on all local servers. The primary use of the function is to register all sessions in the provided root (i.e. /mnt/s0/Data/Subjects) path. The docstring does look outdated which is my bad: it was originally a simple rglob and I added the specific wildcards primarily to improve performance and to skip any 'junk' folders. I agree with Georg that it would be convenient to use this function to register individual sessions from time to time. I can add an if-else statement here to support session path inputs again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't push anything until I've fixed the CI; the certificates have expired on the hooks instance and we should be sure that the integration tests would have caught this potentially disastrous change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docstring says:

def job_creator(root_path, one=None, dry=False, rerun=False):
    ...
    Parameters
    ----------
    root_path : str, pathlib.Path
        Main path containing sessions or a session path.

however, when this function is called with a session path (and not a root path) it fails to find the raw_session.flag file.

Here is a minimal working example, to be executed on parede:

from one.api import ONE
from ibllib.pipes import local_server
from pathlib import Path
base_path = Path('/mnt/s0/Data/Subjects')
one = ONE(cache_rest=None)
eid = "b2f0eb1e-88d9-4d4c-9938-5ff4df2cb7fc" # a session with passive
session_path = base_path / one.eid2path(eid).session_path_short()

print((session_path / 'raw_session.flag').exists()) # True
print(session_path) # /mnt/s0/Data/Subjects/ZFM-08652/2025-07-02/002

local_server.job_creator(session_path, one=one, dry=False) # doesn't find anything

globbing won't work as it starts matching from root directory. In fact in won't work for anything else besides the /mnt/s0/Data/Subjects folder.

So either: 1) the docstring needs to be changed to

    Parameters
    ----------
    root_path : str, pathlib.Path
        path to the folder that contains the subject folders

or 2) flag_files = Path(root_path).glob('**/raw_session.flag')

Relaxing the validation and replacing it with a .glob('**/raw_sessions.flag') will find all folders with a raw sessions flag regardless of the entry point, which is useful if for example a single session or a single animal is to be processed.

Why would this break new session registration @k1o0 ? There are validating steps in the following lines that might take care of trash folders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag_files = Path(root_path).glob('**/raw_session.flag')
flag_files = [file for file in flag_files if re.search(r"/\d{4}-\d{2}-\d{2}/\d{3}/raw_session\.flag$", str(file))]

could be a compromise with some degree of validation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me fix this one up. We now have ONE methods for these globs that are more robust (e.g. the session number can be a single digit on whiterussian). It's not so efficient to apply a regex after a glob (effectively regex twice) when you can simply check if the input path is already a session path. Let me fix the integration server first because this task is extremely important so I don't want to merge without ensuring that it's correctly tested and that the tests pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me fix this one up.

ofc :) I am not doing anything

It's not so efficient to apply a regex after a glob (effectively regex twice) when you can simply check if the input path is already a session path

flag_files = filter(lambda x: is_session_path(x.parent), flag_files)
looks like it handles the validation already, this is why I was surprised that you said removing the validation step from the globbing will break the extractor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants