Provide mechanism for splits internal to a workflow to remain expanded #780

effigies · 2025-04-01T14:41:46Z

[...] if i remember somewhat i believe that nested splits (i.e. any level of splits set inside a workflow), need to be combined before one steps out of a workflow. i cannot remember now how a workflow (as a cacheable task object) deals with splits inside it (whether inner/outer), and especially if the workflow also has splits set on it's inputs. there were certain circumstances under which an output state could not be easily generated/tracked, and we forced that combines needed to happen. i think it was this nested condition.

Originally posted by @satra in #773 (comment)

If this is correct, this seems like a surprising regression over nipype 1. It's rare, but I have found uses for creating a workflow that initiates an iterable that is never joined. It allows, for example, creating iterables over templates, and then hooking up non-iterable workflows as if you are targeting a single template.

I can't immediately think of a use case where you want to expand state inside one subworkflow and then combine within an outer workflow (or a sibling subworkflow). At least the use case above would be served just fine with state variables that become inaccessible to combiners at the end of a workflow, although that does leave a potential footgun for users who do eventually want to combine over those variables.

If we want to permit this use case and make sure that any expanded state is potentially collapsible, then we probably need a way for a workflow node to "take over" the name of an variable expanded by an internal node. Given that our naming approach is <task>.<input_var>, I'm a bit surprised that <wf>.<task>.<input_var> wouldn't be sufficient, but you could imagine something like:

@workflow.define
def iterator_wf(...):
    ...

    wf.rename_split('template_splitter.template_id', 'template_id')
    # Or possibly typical enough to want a name:
    wf.claim_split('template_splitter.template_id')

The text was updated successfully, but these errors were encountered:

satra · 2025-04-01T23:49:31Z

but I have found uses for creating a workflow that initiates an iterable that is never joined.

i think that may be ok. but nested may not be. also dynamic nesting did not exist in nipype1.

use case: for each participant input, generate t-maps, and then iterate over clusters/blobs. i simply don't remember if we dealt with state propagation out of such a workflow beyond the participant input.

tclose · 2025-04-02T03:57:22Z

I'm not sure I'm exactly following but, in your example @effigies, couldn't your nested workflow output a list of values (i.e. generated from the split internal to the workflow), and then you just split over this output when you refer to it in your outer workflow?

I suppose you would lose the structure of the splitter that is internal to the workflow if you only wanted to combine over part of the state, but I feel like there would be ways to work around this wouldn't there?

Currently, any uncombined splits at the end of a workflow are implicitly converted into a list. To my mind, to try to propagate these outside of the internal workflow could get very messy as it breaks the encapsulation of the workflow that currently exists (you can treat it just like any other type of task)

effigies · 2025-04-02T12:28:06Z

couldn't your nested workflow output a list of values (i.e. generated from the split internal to the workflow), and then you just split over this output when you refer to it in your outer workflow?

Yes, but that would require something like:

subwf = workflow.add(splitting_wf(my_iterable))
ident = workflow.add(identity().split(itervar=subwf.out))

downstream = workflow.add(somenode(scalarvar=ident.itervar))

Instead of:

subwf = workflow.add(splitting_wf(my_iterable))

downstream = workflow.add(somenode(scalarvar=subwf.out))

For an example, here is an unjoined iterator workflow: smriprep.workflows.outputs.init_template_iterator_wf()

And here is where it is used. Note that all downstream nodes can be written without any knowledge that they are iterated over.

Further, supposing that subwf consisted of nontrivial jobs, forcing a sync at the end of subwf means that downstream can't start until all of subwf is complete, getting us back to one of the problems with MapNode.

Now, maybe from an encapsulation and readability perspective, it's still not desirable for a workflow to expand state implicitly. We could imagine a way to expand explicitly from the outside:

subwf = workflow.add(splitting_wf(my_iterable).split_outputs('out'))

Internally, the split_outputs() could identify that 'out' is already split and simply rename the state variables instead of combining and re-splitting.

Also, as I'm looking at it, under the current constraints, I would probably simply split the workflow from the outside instead of the inside:

templates = workflow.add(
    template_iterator_wf(
        template=template,
        anat2std_xfm=anat2std_xfm
    ).split(in_tuple=[(s.fullname, s.spec) for s in spaces.cached.get_standard(dim=(3,))])
)

t1w_std = workflow.add(ApplyTransforms(..., transforms=templates.anat2std_xfm, reference_image=templates.std_t1w))
...

So I suppose we should defer this until there's a use-case that can't be trivially refactored to pull the splitter up to the containing workflow.

tclose · 2025-04-03T07:53:31Z

I do think it will be easier to follow what is happening if the splits occur in the outer workflow. However, your example got me thinking about the nested-list vs flattened list (current default) when combining multiple states in a single step. Because if the default was nested list, then coupled with something like #783 you could re-split the combined nested lists along arbitrary dimensions and get back to the state you created in the nested workflow

tclose · 2025-04-29T02:54:19Z

@effigies is it ok to close this issue, or do you think it warrants some more discussion?

tclose added the to consider suggesting changes that require more discussion label Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Provide mechanism for splits internal to a workflow to remain expanded #780

Provide mechanism for splits internal to a workflow to remain expanded #780

effigies commented Apr 1, 2025

satra commented Apr 1, 2025

Uh oh!

tclose commented Apr 2, 2025 •

edited

Loading

Uh oh!

effigies commented Apr 2, 2025 •

edited

Loading

Uh oh!

tclose commented Apr 3, 2025

Uh oh!

tclose commented Apr 29, 2025

Uh oh!

Provide mechanism for splits internal to a workflow to remain expanded #780

Provide mechanism for splits internal to a workflow to remain expanded #780

Comments

effigies commented Apr 1, 2025

satra commented Apr 1, 2025

Uh oh!

tclose commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

effigies commented Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tclose commented Apr 3, 2025

Uh oh!

tclose commented Apr 29, 2025

Uh oh!

tclose commented Apr 2, 2025 •

edited

Loading

effigies commented Apr 2, 2025 •

edited

Loading