Performance metrics for AI/ML RoCEv2 network traffic, for example, large scale CUDA compute tasks using NVIDIA Collective Communication Library (NCCL) operations for inter-GPU communications: AllReduce, Broadcast, Reduce, AllGather, and ReduceScatter.
- Download sFlow-RT
- Run command:
sflow-rt/get-app.sh sflow-rt topology
- Run command:
sflow-rt/get-app.sh sflow-rt ai-metrics
- Restart sFlow-RT
For more information, visit: https://sFlow-RT.com