Rasterize indices only so that the alpha composition can be done in python with more flexibility. #120

liruilong940607 · 2024-02-02T18:32:44Z

Motivated by @hangg7’s question on how to implement distortion loss with gsplat, I'm abstracting out the volrend integral calculation from the rasterizer function, so that things like ND features, alpha channel rendering, distortion loss can be implemented in python with torch's auto diff.

Add a new function rasterize_indices which only return indices so do not need to be differentiable in any case. With this, all gradients can be managed by native torch in the rasterization stage.

Note in the rasterize_indices we still apply early stop so that the returned indices are only those useful for the volrend integral.

This, in most of the case, is inevitably slower than the rasterize_gaussians functions that fuses everything in a single kernel with one pass. However, this can be faster in the ND case (when D is large) as the ND features are now processed in parallel in native torch with the rasterize_indices way.

Here are some profilings (N is number of gaussians, D is the ND color dimension):

Impl.	N=1e3; D=3	N=1e4; D=3	N=1e5; D=3	N=1e6; D=3	N=1e3; D=32	N=1e3; D=256
rasterize gaussians	1452 it/s	535 it/s	56 it/s	4.5 it/s	206 it/s	31 it/s
rasterize indices	588 it/s	376 it/s	53 it/s	4.5 it/s	472 it/s	258 it/s

The command line to get the above profiling (on NVIDIA TITAN RTX):

CUDA_LAUNCH_BLOCKING=1 python tests/test_rasterize.py --profile --N 1000 --D 256

liruilong940607 · 2024-02-02T19:06:09Z

Distortion loss is also implemented with 5 lines of python in the test_rasterize.py file (Maybe should move to another place).

kerrj · 2024-02-07T18:38:13Z

Should we make this the default for rasterizing ND things?

kerrj · 2024-02-07T18:39:26Z

gsplat/rasterize.py

+    opacity: Float[Tensor, "*batch 1"],
+    img_height: int,
+    img_width: int,
+) -> Tensor:


return type should be Tuple[Tensor,Tensor]

kerrj · 2024-02-07T18:41:18Z

gsplat/rasterize.py

+    Returns:
+        A Tensor:
+
+        - **gaussian_ids** (Tensor): Packed (flattened) gaussian ids for intersects. [M,]


can you elaborate a bit on the docstrings of return types? M is >> number of pixels right, since every gaussian gets counted once for each time it's seen in a pixel? if there's any gaurantees on the formatting of the tensors that'd be nice to note too (like it's sorted by pixel or sorted by gaussian etc)

kerrj · 2024-02-07T18:41:36Z

tests/test_rasterize.py

+def _distortion_loss(
+    weights: Tensor, t_mids: Tensor, ray_indices: Tensor, n_rays: int
+) -> Tensor:
+    from nerfacc import accumulate_along_rays, exclusive_sum


add nerfacc as dependency....?

+1 on this dependency!

kerrj · 2024-02-07T18:59:44Z

Testing on a 4090 I get pretty much the same speed between default ND rasterization and the index-based one:
CUDA_LAUNCH_BLOCKING=1 python tests/test_rasterize.py --profile --N 1000 --D 256 produces 429 and 468 it/s, N=1e6 produces 12it/s for both

liruilong940607 · 2024-02-07T21:40:28Z

Testing on a 4090 I get pretty much the same speed between default ND rasterization and the index-based one: CUDA_LAUNCH_BLOCKING=1 python tests/test_rasterize.py --profile --N 1000 --D 256 produces 429 and 468 it/s, N=1e6 produces 12it/s for both

Thanks for the further profiling. Yeah with large N the sorting would become the main bottleneck so they both converge to the similar speed. I'm pretty surprised with the difference is pretty minor in the small N regime on 4090. On 3090 Ti, I get 73it/s v.s. 354it/s with nerfacc==0.5.3

kerrj · 2024-02-08T19:34:51Z

Interesting, I also tried it in nerfstudio and it runs out of memory (requests like 70GB of memory or something), is there a way to get the footprint down or is this just unavoidable because the indices array is so large

liruilong940607 · 2024-02-08T19:49:48Z

Interesting, I also tried it in nerfstudio and it runs out of memory (requests like 70GB of memory or something), is there a way to get the footprint down or is this just unavoidable because the indices array is so large

It is unavoidable if we want the \emph{exact} solution. It is expected it would consume a lot of memory at the start of training where everything is transparent so each pixel would have prob a dozen of GSs intersected with. At the end of the training it would need much less footprint though when the GSs become more opaque.

vye16 · 2024-03-25T21:14:17Z

gsplat/cuda/csrc/bindings.h

+std::tuple<torch::Tensor, torch::Tensor, torch::Tensor>
+rasterize_indices_tensor(
+    const std::tuple<int, int, int> tile_bounds,
+    const std::tuple<int, int, int> block,


can we update this interface so that we hide the block size from the user as in #129 ?

rasterize_indices

8961cb9

liruilong940607 requested a review from vye16 February 2, 2024 18:34

move tqdm into test fn

9090718

liruilong940607 requested a review from maturk February 2, 2024 18:59

add distortion loss example

7728511

liruilong940607 added 2 commits February 2, 2024 20:44

remove redudant code

4add5af

update notes

ee10b2c

kerrj reviewed Feb 7, 2024

View reviewed changes

vye16 reviewed Mar 25, 2024

View reviewed changes

liruilong940607 closed this May 27, 2024

liruilong940607 deleted the factor branch May 27, 2024 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rasterize indices only so that the alpha composition can be done in python with more flexibility. #120

Rasterize indices only so that the alpha composition can be done in python with more flexibility. #120

liruilong940607 commented Feb 2, 2024 •

edited

Loading

liruilong940607 commented Feb 2, 2024

kerrj commented Feb 7, 2024

kerrj Feb 7, 2024

kerrj Feb 7, 2024

kerrj Feb 7, 2024

vye16 Mar 25, 2024

kerrj commented Feb 7, 2024

liruilong940607 commented Feb 7, 2024 •

edited

Loading

kerrj commented Feb 8, 2024

liruilong940607 commented Feb 8, 2024

vye16 Mar 25, 2024

Rasterize indices only so that the alpha composition can be done in python with more flexibility. #120

Rasterize indices only so that the alpha composition can be done in python with more flexibility. #120

Conversation

liruilong940607 commented Feb 2, 2024 • edited Loading

liruilong940607 commented Feb 2, 2024

kerrj commented Feb 7, 2024

kerrj Feb 7, 2024

Choose a reason for hiding this comment

kerrj Feb 7, 2024

Choose a reason for hiding this comment

kerrj Feb 7, 2024

Choose a reason for hiding this comment

vye16 Mar 25, 2024

Choose a reason for hiding this comment

kerrj commented Feb 7, 2024

liruilong940607 commented Feb 7, 2024 • edited Loading

kerrj commented Feb 8, 2024

liruilong940607 commented Feb 8, 2024

vye16 Mar 25, 2024

Choose a reason for hiding this comment

liruilong940607 commented Feb 2, 2024 •

edited

Loading

liruilong940607 commented Feb 7, 2024 •

edited

Loading