Replies: 1 comment 1 reply
-
Hi @tonypg39 When time-slicing is enabled any single GPU available in the system is exposed as the defined number of replicas. From the perspective of the kubelet they are independent resources and are allocated as such. Each replica is, however, associated with a specific GPU id (or uuid) and this mapping is handled by the device plugin when updating the container create response for an allocated pod. Note that there are some things to keep in mind here. The most important being that the GPU is shared using CUDA timeslicing meaning that as more applications are launched each application would get less of the GPU. There is also a danger of memory-oversubscription as no limits are placed on how much memory an application can allocate. For mor details see https://developer.nvidia.com/blog/improving-gpu-utilization-in-kubernetes/ With these taken into consideration and assuming "well-behaved" applications, you should be able to set the number or replicas in your config so that |
Beta Was this translation helpful? Give feedback.
-
Hello,
I had the question of if I use the time slicing option, does it happen that the credit(s) assigned to a pod have a particular GPU id associated with them? Or is it like a credit just means access to the whole set of gpus in the node, and then I would just set the number of replicas to the foreseen amount of pods in the server?
Many thanks for any help,
Toony
Beta Was this translation helpful? Give feedback.
All reactions