You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I’m currently using a TPU v4-64 pod, and I encountered an issue when trying to run multi-worker training llama with the example provided for TPU v5-8. Each worker seems to run independently instead of syncing properly during training. Could you provide an example specifically for TPU pod multi-worker (e.g., TPU v4-64) training where the entire pod is used as a single unit?
The text was updated successfully, but these errors were encountered:
Hi, I’m currently using a TPU v4-64 pod, and I encountered an issue when trying to run multi-worker training llama with the example provided for TPU v5-8. Each worker seems to run independently instead of syncing properly during training. Could you provide an example specifically for TPU pod multi-worker (e.g., TPU v4-64) training where the entire pod is used as a single unit?
The text was updated successfully, but these errors were encountered: