TorchTRT Compilation Memory Consumption Management #3839
Replies: 5 comments 3 replies
-
| In Qwen, TRT builder uses 1x to build live engine. | 
Beta Was this translation helpful? Give feedback.
-
| INetworkDefinition does not take any memory actually, it is the lowered graph and constand folding takes the up to 1x (0-1x) memory. Code here: INetwork is just hold the reference to the weights in the lowered graph | 
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
| Resource Aware Graph ShardingTL;DRWe found that if we split graphs into small distinct engines we can get roughly the same perf but reduce peak memory consumption. So we want to design a phase of the compiler to cut the graph up. In the partitioning step, we break the graph without breaking fusions to ensure compiling the biggest part of the graph does not exceed the CPU RAM budget. Goal(s)
 UsecasesProposed APIs / UXPhase 1 (Experimental / Beta Stability) [2.10]
 torch_tensorrt.compile(module, resource_aware_sharding=True)Default behavior is we estimate the max graph size and shard accordingly 
 torch_tensorrt.compile(module, compile_peak_host_memory_consumption=1e10)We take this as the max CPU memory we can use and shard. Phase 1 (Stable) [2.11+]torch_tensorrt.compile(module, disable_resource_aware_sharding=False)Default behavior is we estimate the max graph size and shard accordingly without user intervention 
 torch_tensorrt.compile(module, compile_peak_host_memory_consumption=1e10)We take this as the max CPU memory we can use and shard. Example WorkflowLimitationsInternal ImplementationDesign
 
 
 Concept: Assumptions
 Extensions Required to Core API implementations
 Data Structures
 graph(x, w, b, scale, ...):
    aten::conv
    aten::batch_norm
    aten::relu
    return graph(x, w, b):
    aten::mm
    aten::add
    return subgraph matching Details specific for FX supportImplementation PhasesPrototype -MVP  | 
Beta Was this translation helpful? Give feedback.
-
| Ensure that outputs are compatible with inputs | 
Beta Was this translation helpful? Give feedback.




Uh oh!
There was an error while loading. Please reload this page.
-
Beta Was this translation helpful? Give feedback.
All reactions