-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rasterize indices only so that the alpha composition can be done in python with more flexibility. #120
Conversation
Distortion loss is also implemented with 5 lines of python in the |
Should we make this the default for rasterizing ND things? |
opacity: Float[Tensor, "*batch 1"], | ||
img_height: int, | ||
img_width: int, | ||
) -> Tensor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return type should be Tuple[Tensor,Tensor]
Returns: | ||
A Tensor: | ||
|
||
- **gaussian_ids** (Tensor): Packed (flattened) gaussian ids for intersects. [M,] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate a bit on the docstrings of return types? M is >> number of pixels right, since every gaussian gets counted once for each time it's seen in a pixel? if there's any gaurantees on the formatting of the tensors that'd be nice to note too (like it's sorted by pixel or sorted by gaussian etc)
def _distortion_loss( | ||
weights: Tensor, t_mids: Tensor, ray_indices: Tensor, n_rays: int | ||
) -> Tensor: | ||
from nerfacc import accumulate_along_rays, exclusive_sum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add nerfacc as dependency....?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on this dependency!
Testing on a 4090 I get pretty much the same speed between default ND rasterization and the index-based one: |
Thanks for the further profiling. Yeah with large N the sorting would become the main bottleneck so they both converge to the similar speed. I'm pretty surprised with the difference is pretty minor in the small N regime on 4090. On 3090 Ti, I get 73it/s v.s. 354it/s with |
Interesting, I also tried it in nerfstudio and it runs out of memory (requests like 70GB of memory or something), is there a way to get the footprint down or is this just unavoidable because the indices array is so large |
It is unavoidable if we want the \emph{exact} solution. It is expected it would consume a lot of memory at the start of training where everything is transparent so each pixel would have prob a dozen of GSs intersected with. At the end of the training it would need much less footprint though when the GSs become more opaque. |
std::tuple<torch::Tensor, torch::Tensor, torch::Tensor> | ||
rasterize_indices_tensor( | ||
const std::tuple<int, int, int> tile_bounds, | ||
const std::tuple<int, int, int> block, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we update this interface so that we hide the block size from the user as in #129 ?
Motivated by @hangg7’s question on how to implement distortion loss with gsplat, I'm abstracting out the volrend integral calculation from the rasterizer function, so that things like ND features, alpha channel rendering, distortion loss can be implemented in python with torch's auto diff.
Add a new function
rasterize_indices
which only return indices so do not need to be differentiable in any case. With this, all gradients can be managed by native torch in the rasterization stage.Note in the
rasterize_indices
we still apply early stop so that the returned indices are only those useful for the volrend integral.This, in most of the case, is inevitably slower than the
rasterize_gaussians
functions that fuses everything in a single kernel with one pass. However, this can be faster in the ND case (when D is large) as the ND features are now processed in parallel in native torch with therasterize_indices
way.Here are some profilings (N is number of gaussians, D is the ND color dimension):
The command line to get the above profiling (on NVIDIA TITAN RTX):