[BUG]: BlockAdjacentDifference requests 2x the shared memory actually needed #3711

pauleonix · 2025-02-06T14:04:32Z

Is this a duplicate?

I confirmed there appear to be no duplicate issues for this bug and that I agree to the Code of Conduct

Type of Bug

Performance

Component

CUB

Describe the bug

Looking at

cccl/cub/cub/block/block_adjacent_difference.cuh

Lines 133 to 137 in 9b7333b

    
           struct _TempStorage 
        
           { 
        
             T first_items[BLOCK_THREADS]; 
        
             T last_items[BLOCK_THREADS]; 
        
           };

and the at all the member functions, it seems like none of them is using both first_items and last_items, i.e. a single array halo_items or similar would suffice. The unnecessary amount of requested shared memory can in practice result in reduced occupancy and therefore worse performance.

How to Reproduce

Not applicable.

Expected behavior

BlockAdjacentDifference should only request as much shared memory as it actually needs.

Reproduction link

No response

Operating System

No response

nvidia-smi output

No response

NVCC version

No response

The text was updated successfully, but these errors were encountered:

pauleonix · 2025-02-06T14:10:12Z

Arguably there should be a version of these algorithms that only use shared memory for inter-warp communication and warp shuffles otherwise for minimal shared memory usage at reduced performance like e.g. BLOCK_LOAD_WARP_TRANSPOSE_TIMESLICED, but that would be another issue.

pauleonix · 2025-02-06T14:19:20Z

Maybe at some point it was planned to also have algorithms that look both left and right, but it is a hard ask to pessimize these common algorithms just for those non-existent one to use the same API.

pauleonix added the bug Something isn't working right. label Feb 6, 2025

github-project-automation bot added this to CCCL Feb 6, 2025

github-project-automation bot moved this to Todo in CCCL Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: BlockAdjacentDifference requests 2x the shared memory actually needed #3711

[BUG]: BlockAdjacentDifference requests 2x the shared memory actually needed #3711

pauleonix commented Feb 6, 2025 •

edited

Loading

pauleonix commented Feb 6, 2025 •

edited

Loading

pauleonix commented Feb 6, 2025

[BUG]: BlockAdjacentDifference requests 2x the shared memory actually needed #3711

[BUG]: BlockAdjacentDifference requests 2x the shared memory actually needed #3711

Comments

pauleonix commented Feb 6, 2025 • edited Loading

Is this a duplicate?

Type of Bug

Component

Describe the bug

How to Reproduce

Expected behavior

Reproduction link

Operating System

nvidia-smi output

NVCC version

pauleonix commented Feb 6, 2025 • edited Loading

pauleonix commented Feb 6, 2025

pauleonix commented Feb 6, 2025 •

edited

Loading

pauleonix commented Feb 6, 2025 •

edited

Loading