Refactor interfaces to hide tile_bounds and allow dynamic block_size #129

kerrj · 2024-02-13T21:43:03Z

Previously the user had to work with black magic tile_bounds and the values of BLOCK_X, BLOCK_Y were hardcoded in CUDA so only 16 was a valid input. Now, CUDA has been refactored to take any value of block size <=16, and the tile_bounds computations have been completely hidden from the user. The new interface for project_gaussians and rasterize_gaussians instead accept an integer block_size input, from which tile bounds is calculated from.

There are 2 goals for this change:

Clean up the interface to users for tile_bounds and get rid of hardcoded BLOCK_X in CUDA, and
Dynamic block size will allow more sophisticated shared memory handling down the road, allowing for smaller block sizes to rasterize higher dimensional values more efficiently. This will address speed issues in ND rasterization like seen in nd rasterizer is 10x slower than rasterizer #68 when implemented.

…pdate tests to reflect this change

kerrj · 2024-02-13T21:45:57Z

I tested this PR in nerfstudio with block sizes 2-16, training works for all sizes tested

liruilong940607 · 2024-02-14T19:26:22Z

I like the idea of cleaning up tile_bounds! Will look more closer later today or tomorrow!

vye16

this looks good to me, tested on block sizes 2^n from 1 to 4. please rename block_size to block_width so we can distinguish from actual block size (total size) vs side length. otherwise lgtm I'll approve after those changes. make sure to make the accompanying changes in nerfstudio

vye16

great lgtm

maturk · 2024-03-10T14:46:05Z

@kerrj, can you explain the need for this in relation to shared memory. I thought that regardless of the num of threads/workers in a CUDA block, the shared memory size is fixed? So changing the block_width, and hence thread count, here would not make a difference to the allocatable shared memory. Let me know if I have any misunderstandings.

kerrj · 2024-03-18T17:28:07Z

There are some nuances with shared memory size; 1) Some GPUs allow allocating more than 48KB of shared memory if you dynamically allocate it 2) launching a kernel with too much shared memory requested limits how many blocks can launch at the same time, which can starve the processor, in which case launching with a smaller block size is actually faster since more of the processors can be utilized. 3) regardless of the num of threads/workers in a CUDA block, the shared memory size is fixed This is true about the maximum shared memory, however you can launch with less to free up shared memory for other blocks executing.

refactor block size to be dynamic, hide tile_bounds from interface, u…

dabd9dd

…pdate tests to reflect this change

kerrj requested review from vye16 and liruilong940607 February 13, 2024 21:43

kerrj mentioned this pull request Feb 14, 2024

Speed up ND rasterization #130

Merged

vye16 requested changes Feb 20, 2024

View reviewed changes

block_size->block_width, lint

4ef1ea7

kerrj requested a review from vye16 February 20, 2024 20:38

lint

5ab4fb5

vye16 approved these changes Feb 20, 2024

View reviewed changes

vye16 merged commit 10bc1d0 into main Feb 20, 2024
2 checks passed

maturk mentioned this pull request Mar 13, 2024

Update Docs #142

Closed

vye16 mentioned this pull request Mar 25, 2024

Rasterize indices only so that the alpha composition can be done in python with more flexibility. #120

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor interfaces to hide tile_bounds and allow dynamic block_size #129

Refactor interfaces to hide tile_bounds and allow dynamic block_size #129

kerrj commented Feb 13, 2024

kerrj commented Feb 13, 2024

liruilong940607 commented Feb 14, 2024

vye16 left a comment

vye16 left a comment

maturk commented Mar 10, 2024

kerrj commented Mar 18, 2024

Refactor interfaces to hide tile_bounds and allow dynamic block_size #129

Refactor interfaces to hide tile_bounds and allow dynamic block_size #129

Conversation

kerrj commented Feb 13, 2024

kerrj commented Feb 13, 2024

liruilong940607 commented Feb 14, 2024

vye16 left a comment

Choose a reason for hiding this comment

vye16 left a comment

Choose a reason for hiding this comment

maturk commented Mar 10, 2024

kerrj commented Mar 18, 2024