Skip to content

Provide mechanism for splits internal to a workflow to remain expanded #780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
effigies opened this issue Apr 1, 2025 · 5 comments
Open
Labels
to consider suggesting changes that require more discussion

Comments

@effigies
Copy link
Contributor

effigies commented Apr 1, 2025

[...] if i remember somewhat i believe that nested splits (i.e. any level of splits set inside a workflow), need to be combined before one steps out of a workflow. i cannot remember now how a workflow (as a cacheable task object) deals with splits inside it (whether inner/outer), and especially if the workflow also has splits set on it's inputs. there were certain circumstances under which an output state could not be easily generated/tracked, and we forced that combines needed to happen. i think it was this nested condition.

Originally posted by @satra in #773 (comment)

If this is correct, this seems like a surprising regression over nipype 1. It's rare, but I have found uses for creating a workflow that initiates an iterable that is never joined. It allows, for example, creating iterables over templates, and then hooking up non-iterable workflows as if you are targeting a single template.

I can't immediately think of a use case where you want to expand state inside one subworkflow and then combine within an outer workflow (or a sibling subworkflow). At least the use case above would be served just fine with state variables that become inaccessible to combiners at the end of a workflow, although that does leave a potential footgun for users who do eventually want to combine over those variables.

If we want to permit this use case and make sure that any expanded state is potentially collapsible, then we probably need a way for a workflow node to "take over" the name of an variable expanded by an internal node. Given that our naming approach is <task>.<input_var>, I'm a bit surprised that <wf>.<task>.<input_var> wouldn't be sufficient, but you could imagine something like:

@workflow.define
def iterator_wf(...):
    ...

    wf.rename_split('template_splitter.template_id', 'template_id')
    # Or possibly typical enough to want a name:
    wf.claim_split('template_splitter.template_id')
@satra
Copy link
Contributor

satra commented Apr 1, 2025

but I have found uses for creating a workflow that initiates an iterable that is never joined.

i think that may be ok. but nested may not be. also dynamic nesting did not exist in nipype1.

use case: for each participant input, generate t-maps, and then iterate over clusters/blobs. i simply don't remember if we dealt with state propagation out of such a workflow beyond the participant input.

@tclose
Copy link
Contributor

tclose commented Apr 2, 2025

I'm not sure I'm exactly following but, in your example @effigies, couldn't your nested workflow output a list of values (i.e. generated from the split internal to the workflow), and then you just split over this output when you refer to it in your outer workflow?

I suppose you would lose the structure of the splitter that is internal to the workflow if you only wanted to combine over part of the state, but I feel like there would be ways to work around this wouldn't there?

Currently, any uncombined splits at the end of a workflow are implicitly converted into a list. To my mind, to try to propagate these outside of the internal workflow could get very messy as it breaks the encapsulation of the workflow that currently exists (you can treat it just like any other type of task)

@tclose tclose added the to consider suggesting changes that require more discussion label Apr 2, 2025
@effigies
Copy link
Contributor Author

effigies commented Apr 2, 2025

couldn't your nested workflow output a list of values (i.e. generated from the split internal to the workflow), and then you just split over this output when you refer to it in your outer workflow?

Yes, but that would require something like:

subwf = workflow.add(splitting_wf(my_iterable))
ident = workflow.add(identity().split(itervar=subwf.out))

downstream = workflow.add(somenode(scalarvar=ident.itervar))

Instead of:

subwf = workflow.add(splitting_wf(my_iterable))

downstream = workflow.add(somenode(scalarvar=subwf.out))

For an example, here is an unjoined iterator workflow: smriprep.workflows.outputs.init_template_iterator_wf()

And here is where it is used. Note that all downstream nodes can be written without any knowledge that they are iterated over.

Further, supposing that subwf consisted of nontrivial jobs, forcing a sync at the end of subwf means that downstream can't start until all of subwf is complete, getting us back to one of the problems with MapNode.


Now, maybe from an encapsulation and readability perspective, it's still not desirable for a workflow to expand state implicitly. We could imagine a way to expand explicitly from the outside:

subwf = workflow.add(splitting_wf(my_iterable).split_outputs('out'))

Internally, the split_outputs() could identify that 'out' is already split and simply rename the state variables instead of combining and re-splitting.


Also, as I'm looking at it, under the current constraints, I would probably simply split the workflow from the outside instead of the inside:

templates = workflow.add(
    template_iterator_wf(
        template=template,
        anat2std_xfm=anat2std_xfm
    ).split(in_tuple=[(s.fullname, s.spec) for s in spaces.cached.get_standard(dim=(3,))])
)

t1w_std = workflow.add(ApplyTransforms(..., transforms=templates.anat2std_xfm, reference_image=templates.std_t1w))
...

So I suppose we should defer this until there's a use-case that can't be trivially refactored to pull the splitter up to the containing workflow.

@tclose
Copy link
Contributor

tclose commented Apr 3, 2025

I do think it will be easier to follow what is happening if the splits occur in the outer workflow. However, your example got me thinking about the nested-list vs flattened list (current default) when combining multiple states in a single step. Because if the default was nested list, then coupled with something like #783 you could re-split the combined nested lists along arbitrary dimensions and get back to the state you created in the nested workflow

@tclose
Copy link
Contributor

tclose commented Apr 29, 2025

@effigies is it ok to close this issue, or do you think it warrants some more discussion?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
to consider suggesting changes that require more discussion
Projects
None yet
Development

No branches or pull requests

3 participants