Skip to content

fix: improve the GPU utilization with better Tensor and Var handling #18

@FiberedSkies

Description

@FiberedSkies

Currently there are a few issues with the way the Matrix, VarMatrix, and Vector wrappers of candle_core::Tensor are being treated, that leads to bad GPU utilization and low cache locality. These include:

  1. Frequent Device Transfers: The code creates many small tensors individually, which can cause inefficient GPU memory allocation patterns.
  2. Redundant Device/DType Storage: Each Matrix, VarMatrix, and Vector stores its own Device and DType, which is redundant since the underlying Tensor already has this information.
  3. Inefficient Small Operations: Operations like creating identity matrices element-by-element are not GPU-optimized.
  4. Sequential Processing: The sheaf operations process cells one at a time rather than in batches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions