Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low performance on TPUs #1217

Open
wtedw opened this issue Aug 28, 2024 · 1 comment
Open

Low performance on TPUs #1217

wtedw opened this issue Aug 28, 2024 · 1 comment

Comments

@wtedw
Copy link

wtedw commented Aug 28, 2024

PGX on TPUs seems to be slower than CPUs.
With a TPU v3-8, PGX is only achieving 1638 steps / sec on the game of chess.

Minimal Reproducible Example
PGX CPU vs TPU Test (512 env) (with sharding)

PGX CPU vs TPU Test (64 env) (single device)

Running around 8192 envs seems to be the limit. With split sharding across 8 devices, it takes about 1 hour and 27 minutes. If more than 8192 envs are used, there will be memory issues during JIT AOT compilation.

@sotetsuk
Copy link
Owner

sotetsuk commented Aug 30, 2024

Hi,

Thank you for opening the issue!

Given your comment, I did some preliminary experiments and I have confirmed similar results.
I noticed that the throughput on TPUs (v2) was indeed slow on colab, even though the same program scales appropriately on colab A100 GPU.

It may require further investigation. Let's use this issue.

TODOs may include:

  • Try the latest TPU
  • Test other games
  • Find out bottleneck (I guess it's legal action mask computation)

@sotetsuk sotetsuk changed the title PGX slower on TPUs Low performance on TPUs Aug 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants