Skip to content

Set TF_FORCE_GPU_ALLOW_GROWTH=true by default#712

Closed
samos123 wants to merge 2 commits intoapple:mainfrom
samos123:fix-gpu-ooms
Closed

Set TF_FORCE_GPU_ALLOW_GROWTH=true by default#712
samos123 wants to merge 2 commits intoapple:mainfrom
samos123:fix-gpu-ooms

Conversation

@samos123
Copy link
Copy Markdown
Contributor

@samos123 samos123 commented Sep 24, 2024

This is needed to be able to run Fuji v2 70B on GPU without GPU memory OOMs.

@kelvin-zou can likely confirm whether this should be the default or not.

This is needed to be able to run Fuji v2 70B on GPU without GPU memory
OOMs.
Copy link
Copy Markdown
Contributor

@kelvin-zou kelvin-zou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Can we move it to somewhere since it is GPU specific? This launch cmd is shared across GPU and TPU.

@samos123
Copy link
Copy Markdown
Contributor Author

samos123 commented Sep 24, 2024

Hmm but we also set a lot of TPU environment variables in launch.py without any if statements. I don't think there is a better place since it needs to happen before jax is started?

Would you prefer this?

if instance_type.startswith("gpu"):
    # Prevent GPU OOM issues due to TF taking up all the GPU memory.
    # Reference: https://stackoverflow.com/a/54927279
    os.environ.setdefault("TF_FORCE_GPU_ALLOW_GROWTH", "true")

@samos123 samos123 requested a review from kelvin-zou September 24, 2024 22:53
@changlan changlan requested a review from a team as a code owner July 23, 2025 21:50
@changlan
Copy link
Copy Markdown
Contributor

Hi @samos123 is this PR still relevant?

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has been inactive for 60 days. It will be closed in 7 days if no further activity occurs. If you would like to continue working on this, please remove the stale label or leave a comment.

@github-actions github-actions Bot added the stale label Oct 17, 2025
@github-actions
Copy link
Copy Markdown

This pull request was closed because it has been inactive for more than 7 days since being marked as stale. Please feel free to reopen it if you would like to continue.

@github-actions github-actions Bot closed this Oct 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants