Low performance on TPUs #1217

wtedw · 2024-08-28T12:36:08Z

PGX on TPUs seems to be slower than CPUs.
With a TPU v3-8, PGX is only achieving 1638 steps / sec on the game of chess.

Minimal Reproducible Example
PGX CPU vs TPU Test (512 env) (with sharding)

https://gist.github.com/wtedw/e7332e8d99acd0132be5f82c389d8f60
512 envs * 512 game steps = 262,144 steps.
Runtime = 2 min 40s = 160 sec
262,144 steps / 160 sec = 1,638 steps/sec

PGX CPU vs TPU Test (64 env) (single device)

https://gist.github.com/wtedw/6e070e7ebed33dc52f798960a2789c75

Running around 8192 envs seems to be the limit. With split sharding across 8 devices, it takes about 1 hour and 27 minutes. If more than 8192 envs are used, there will be memory issues during JIT AOT compilation.

sotetsuk · 2024-08-30T23:59:00Z

Hi,

Thank you for opening the issue!

Given your comment, I did some preliminary experiments and I have confirmed similar results.
I noticed that the throughput on TPUs (v2) was indeed slow on colab, even though the same program scales appropriately on colab A100 GPU.

It may require further investigation. Let's use this issue.

TODOs may include:

Try the latest TPU
Test other games
Find out bottleneck (I guess it's legal action mask computation)

sotetsuk changed the title ~~PGX slower on TPUs~~ Low performance on TPUs Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low performance on TPUs #1217

Low performance on TPUs #1217

wtedw commented Aug 28, 2024

sotetsuk commented Aug 30, 2024 •

edited

Loading

Low performance on TPUs #1217

Low performance on TPUs #1217

Comments

wtedw commented Aug 28, 2024

sotetsuk commented Aug 30, 2024 • edited Loading

sotetsuk commented Aug 30, 2024 •

edited

Loading