Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions ibllib/pipes/local_server.py
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check with @k1o0 but I think the original intent was to be able to operate within a session path or at a level containing multiple session paths using the same glob pattern, while providing some validation.
Do you have more info about your failure case ? It would be nice to fix it while maintaining compatibility with the 2 requirements above !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed Georg's change here would break new session registration on all local servers. The primary use of the function is to register all sessions in the provided root (i.e. /mnt/s0/Data/Subjects) path. The docstring does look outdated which is my bad: it was originally a simple rglob and I added the specific wildcards primarily to improve performance and to skip any 'junk' folders. I agree with Georg that it would be convenient to use this function to register individual sessions from time to time. I can add an if-else statement here to support session path inputs again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't push anything until I've fixed the CI; the certificates have expired on the hooks instance and we should be sure that the integration tests would have caught this potentially disastrous change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the docstring says:

def job_creator(root_path, one=None, dry=False, rerun=False):
    ...
    Parameters
    ----------
    root_path : str, pathlib.Path
        Main path containing sessions or a session path.

however, when this function is called with a session path (and not a root path) it fails to find the raw_session.flag file.

Here is a minimal working example, to be executed on parede:

from one.api import ONE
from ibllib.pipes import local_server
from pathlib import Path
base_path = Path('/mnt/s0/Data/Subjects')
one = ONE(cache_rest=None)
eid = "b2f0eb1e-88d9-4d4c-9938-5ff4df2cb7fc" # a session with passive
session_path = base_path / one.eid2path(eid).session_path_short()

print((session_path / 'raw_session.flag').exists()) # True
print(session_path) # /mnt/s0/Data/Subjects/ZFM-08652/2025-07-02/002

local_server.job_creator(session_path, one=one, dry=False) # doesn't find anything

globbing won't work as it starts matching from root directory. In fact in won't work for anything else besides the /mnt/s0/Data/Subjects folder.

So either: 1) the docstring needs to be changed to

    Parameters
    ----------
    root_path : str, pathlib.Path
        path to the folder that contains the subject folders

or 2) flag_files = Path(root_path).glob('**/raw_session.flag')

Relaxing the validation and replacing it with a .glob('**/raw_sessions.flag') will find all folders with a raw sessions flag regardless of the entry point, which is useful if for example a single session or a single animal is to be processed.

Why would this break new session registration @k1o0 ? There are validating steps in the following lines that might take care of trash folders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flag_files = Path(root_path).glob('**/raw_session.flag')
flag_files = [file for file in flag_files if re.search(r"/\d{4}-\d{2}-\d{2}/\d{3}/raw_session\.flag$", str(file))]

could be a compromise with some degree of validation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me fix this one up. We now have ONE methods for these globs that are more robust (e.g. the session number can be a single digit on whiterussian). It's not so efficient to apply a regex after a glob (effectively regex twice) when you can simply check if the input path is already a session path. Let me fix the integration server first because this task is extremely important so I don't want to merge without ensuring that it's correctly tested and that the tests pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me fix this one up.

ofc :) I am not doing anything

It's not so efficient to apply a regex after a glob (effectively regex twice) when you can simply check if the input path is already a session path

flag_files = filter(lambda x: is_session_path(x.parent), flag_files)
looks like it handles the validation already, this is why I was surprised that you said removing the validation step from the globbing will break the extractor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're quite right. I didn't look at the diff close enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test and changed the logic a bit: I assume if you pass in a specific session path you wouldn't care about the raw session flag. Let me know if that makes sense to you.

Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,7 @@
from one.api import ONE
from one.webclient import AlyxClient
from one.remote.globus import get_lab_from_endpoint_id, get_local_endpoint_id
from one.alf.spec import is_session_path
from one.alf.path import session_path_parts
from one.alf.path import ALFPath

from ibllib import __version__ as ibllib_version
from ibllib.pipes import tasks
Expand Down Expand Up @@ -82,12 +81,11 @@ def job_creator(root_path, one=None, dry=False, rerun=False):
1) create the session on Alyx
2) create the tasks to be run on Alyx

For legacy sessions the raw data are registered separately, instead of within a pipeline task.

Parameters
----------
root_path : str, pathlib.Path
Main path containing sessions or a session path.
Main path containing sessions or a session path. NB: If a session path is passed,
a raw_session.flag file needn't be present.
one : one.api.OneAlyx
An ONE instance for registering the session(s).
dry : bool
Expand All @@ -106,13 +104,16 @@ def job_creator(root_path, one=None, dry=False, rerun=False):
if not one:
one = ONE(cache_rest=None)
rc = IBLRegistrationClient(one=one)
flag_files = Path(root_path).glob('*/????-??-??/*/raw_session.flag')
flag_files = filter(lambda x: is_session_path(x.parent), flag_files)
if (root_path := ALFPath(root_path)).is_session_path():
flag_files = [root_path.joinpath('raw_session.flag')]
else:
flag_files = root_path.glob('*/????-??-??/*/raw_session.flag')
flag_files = filter(lambda f: f.parent.is_session_path(), flag_files)
pipes = []
all_datasets = []
for flag_file in flag_files:
session_path = flag_file.parent
if session_path_parts(session_path)[1] in ('test', 'test_subject'):
if session_path.subject in ('test', 'test_subject'):
_logger.debug('skipping test session %s', session_path)
continue
_logger.info(f'creating session for {session_path}')
Expand Down
18 changes: 18 additions & 0 deletions ibllib/tests/test_pipes.py
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,24 @@ def test_task_queue(self, lab_repo_mock):
queue = local_server.task_queue(mode='small', lab='foolab', alyx=alyx)
self.assertEqual([tasks[2]], queue)

def test_job_creator(self):
"""Test ibllib.pipes.local_server.job_creator function.

This test simply checks that a specific session path can be passed and that a raw_session.flag
file is not necessary. For a full test of the job creator, see ci.tests.iblscripts.test_report_create_jobs
in iblscripts.
"""
session_path = self.tmpdir / 'foo' / '2020-01-01' / '001'
assert not session_path.joinpath('raw_session.flag').exists()
with self.assertLogs('ibllib.pipes.local_server', level='INFO') as log:
local_server.job_creator(session_path, dry=True)
self.assertIn('creating session for', log.output[-1])
# Check skip when test subject
session_path = self.tmpdir / 'test' / '2020-01-01' / '001'
with self.assertLogs('ibllib.pipes.local_server', level='DEBUG') as log:
local_server.job_creator(session_path, dry=True)
self.assertIn('skipping test session', log.output[-1])


class TestPipesMisc(unittest.TestCase):
""""""
Expand Down
Loading