Decouple device flow count from parameters #722

devreal · 2024-12-23T18:07:43Z

Make PARSEC_MAX_DEVICE_FLOWS configurable and select a proper integer type, up to int128_t. Make sure the flow mask is properly checked.

This is an attempt at allowing an increase of the number of device inputs up to 128. The use of 128bit integers for flows requires proper handling of integer types so this PR introduces parsec_flow_mask_t and provides macros to set and test the flow mask.

Related: #694 where the flows are separated out of the gpu task structure.

parsec/include/parsec/parsec_config_bottom.h

CMakeLists.txt

parsec/include/parsec/parsec_config_bottom.h

Make PARSEC_MAX_DEVICE_FLOWS configurable and select a proper integer type, up to int128_t. Make sure the flow mask is properly checked. Signed-off-by: Joseph Schuchart <[email protected]>

abouteiller · 2025-03-14T15:07:46Z

tests/runtime/cuda/stage_custom.jdf

    int elem_sz;
    int i;
    for(i = 0; i < task->task_class->nb_flows; i++){
-        if(flow_mask & (1U << i)){


Yes, this changes public interfaces, stricto sensu would require bump to v5; however, with the default build parameters, this is strictly equivalent to the 4.0 API.

abouteiller · 2025-03-14T17:56:34Z

I am willing to let this go in v4.1 despite the in-principle API breakage since it is backward compatible as long as you don't compile with custom parameters (notably Dplasma still works as intended).

Note that in the end, even if we were to bump the API to v5, compiling with different sizes would make the ABI of stage_in/stage_out inconsistent between builds. But we don't have an ABI so we don't care(?).

Please chime in.

bosilca

I don't think we should merge this because it creates an unstable API.

Moreover, I think it is unreasonable to expect a function call with 128 arguments because at the the a task is just a delegated function call. I could understand extreme cases, but over 64 arguments (and I'm being overly generous) it is costly to manage the copies and tranfers for such tasks.

Instead, we can change all the places we use dependency masks to allow for 64 bits masks, but then keep it there. This will give us a stable API and you a little more giggle rooms with tasks with large number of inputs.

bosilca · 2025-03-17T00:13:53Z

parsec/mca/device/device_gpu.c

    gpu_task->sources[flow->flow_index] = candidate;  /* save the candidate for release on transfer completion */
    /* Push data into the GPU from the source device */
-    int rc = gpu_task->stage_in ? gpu_task->stage_in(gpu_task, (1U << flow->flow_index), gpu_stream): PARSEC_SUCCESS;
+    int rc = gpu_task->stage_in ? gpu_task->stage_in(gpu_task, PARSEC_FLOW_MASK(flow->flow_index), gpu_stream): PARSEC_SUCCESS;


This is extremely problematic, as we are looking to call a function that the user potentially registered. This is well worst than an API breakage, this make the API dependent on parsec build type parameters.

devreal · 2025-03-17T01:28:43Z

I can see that this the API change is a problem. I'll need to find a different way of expressing this then. The argument that this is in any way connected to the number of arguments to the task function is overly simplistic and not at all relevant for TTG (we can create them inside the task and pass them as array to the kernel as long as we stay below the 2kB argument cap). I'm not arguing for 128 inputs for the giggles but because I expect them to become relevant, e.g., for combining two levels of tree traversal in 3D MRA or for implementing 6D MRA, which will require ~70 separate device memory buffers.

Closing this, will think of something else.

devreal requested a review from a team as a code owner December 23, 2024 18:07

abouteiller reviewed Jan 2, 2025

View reviewed changes

parsec/include/parsec/parsec_config_bottom.h Outdated Show resolved Hide resolved

CMakeLists.txt Show resolved Hide resolved

devreal force-pushed the flow-mask-128bit branch from 91b0ef2 to 561854a Compare January 23, 2025 22:58

abouteiller reviewed Feb 14, 2025

View reviewed changes

parsec/include/parsec/parsec_config_bottom.h Outdated Show resolved Hide resolved

Decouple device flow count from parameters

e3a7829

Make PARSEC_MAX_DEVICE_FLOWS configurable and select a proper integer type, up to int128_t. Make sure the flow mask is properly checked. Signed-off-by: Joseph Schuchart <[email protected]>

devreal force-pushed the flow-mask-128bit branch from 561854a to e3a7829 Compare February 14, 2025 21:47

abouteiller approved these changes Mar 14, 2025

View reviewed changes

abouteiller added this to the v4.1 milestone Mar 14, 2025

abouteiller reviewed Mar 14, 2025

View reviewed changes

abouteiller added API Change Change to the public API, backward incompatible (version major bump) enhancement New feature or request labels Mar 14, 2025

abouteiller added a commit to abouteiller/dplasma that referenced this pull request Mar 14, 2025

Update for API 4.1 flow-mask ICLDisco/parsec#722

2dec1fe

abouteiller mentioned this pull request Mar 14, 2025

Update for API 4.1 flow-mask icldisco/parsec#722 ICLDisco/dplasma#143

Closed

1 task

abouteiller added a commit to abouteiller/dplasma that referenced this pull request Mar 14, 2025

Update for API 4.1 flow-mask ICLDisco/parsec#722

0117546

bosilca reviewed Mar 17, 2025

View reviewed changes

devreal closed this Mar 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple device flow count from parameters #722

Decouple device flow count from parameters #722

Uh oh!

devreal commented Dec 23, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abouteiller Mar 14, 2025

Uh oh!

abouteiller Mar 14, 2025 •

edited

Loading

Uh oh!

abouteiller commented Mar 14, 2025 •

edited

Loading

Uh oh!

bosilca left a comment

Uh oh!

bosilca Mar 17, 2025

Uh oh!

devreal commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Decouple device flow count from parameters #722

Decouple device flow count from parameters #722

Uh oh!

Conversation

devreal commented Dec 23, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abouteiller Mar 14, 2025

Choose a reason for hiding this comment

Uh oh!

abouteiller Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abouteiller commented Mar 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bosilca left a comment

Choose a reason for hiding this comment

Uh oh!

bosilca Mar 17, 2025

Choose a reason for hiding this comment

Uh oh!

devreal commented Mar 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

abouteiller Mar 14, 2025 •

edited

Loading

abouteiller commented Mar 14, 2025 •

edited

Loading