Skip to content

Respect each example requirements and use uv #1330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 26, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions .github/workflows/main_distributed.yaml
Original file line number Diff line number Diff line change
@@ -22,12 +22,14 @@ jobs:
with:
python-version: 3.8
- name: Install PyTorch
run: |
python -m pip install --upgrade pip
pip install --pre torch -f https://download.pytorch.org/whl/nightly/cu118/torch_nightly.html
uses: astral-sh/setup-uv@v6
- name: Run Tests
env:
USE_CUDA: 'True'
VIRTUAL_ENV: '.venv'
PIP_INSTALL_ARGS: '--pre -f https://download.pytorch.org/whl/nightly/cu118/torch_nightly.html'
run: |
./run_distributed_examples.sh "run_all,clean"
./run_distributed_examples.sh
- name: Open issue on failure
if: ${{ failure() && github.event_name == 'schedule' }}
uses: rishabhgupta/git-action-issue@v2
14 changes: 6 additions & 8 deletions .github/workflows/main_python.yml
Original file line number Diff line number Diff line change
@@ -21,16 +21,14 @@ jobs:
uses: actions/setup-python@v5
with:
python-version: '3.10'
- name: Install PyTorch
run: |
python -m pip install --upgrade pip
# Install CPU-based pytorch
pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
# Maybe use the CUDA 10.2 version instead?
# pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html
- name: Install uv
uses: astral-sh/setup-uv@v6
- name: Run Tests
env:
VIRTUAL_ENV: '.venv'
PIP_INSTAL_ARGS: '--pre -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html'
run: |
./run_python_examples.sh "install_deps,run_all,clean"
./run_python_examples.sh
- name: Open issue on failure
if: ${{ failure() && github.event_name == 'schedule' }}
uses: rishabhgupta/git-action-issue@v2
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -40,8 +40,8 @@ If you're new, we encourage you to take a look at issues tagged with [good first
1. Fork the repo and create your branch from `main`.
2. Make sure you have a GPU-enabled machine, either locally or in the cloud. `g4dn.4xlarge` is a good starting point on AWS.
3. Make your code change.
4. First, install all dependencies with `./run_python_examples.sh "install_deps"`.
5. Then, make sure that `./run_python_examples.sh` passes locally by running the script end to end.
4. Install `uv`.
5. Then, make sure that `VIRTUAL_ENV=.venv ./run_python_examples.sh` passes locally by running the script end to end.
6. If you haven't already, complete the Contributor License Agreement ("CLA").
7. Address any feedback in code review promptly.

3 changes: 3 additions & 0 deletions fast_neural_style/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
numpy
torch
torchvision
2 changes: 2 additions & 0 deletions fx/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
torch
torchvision
1 change: 1 addition & 0 deletions regression/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
torch
2 changes: 1 addition & 1 deletion reinforcement_learning/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
torch
numpy
numpy<2
gym
pygame
46 changes: 29 additions & 17 deletions run_distributed_examples.sh
Original file line number Diff line number Diff line change
@@ -4,16 +4,30 @@
# The purpose is just as an integration test, not to actually train models in any meaningful way.
# For that reason, most of these set epochs = 1 and --dry-run.
#
# Optionally specify a comma separated list of examples to run.
# can be run as:
# ./run_python_examples.sh "install_deps,run_all,clean"
# to pip install dependencies (other than pytorch), run all examples, and remove temporary/changed data files.
# Expects pytorch, torchvision to be installed.
# Optionally specify a comma separated list of examples to run. Can be run as:
# * To run all examples:
# ./run_distributed_examples.sh
# * To run specific example:
# ./run_distributed_examples.sh "distributed/tensor_parallelism,distributed/ddp"
#
# To test examples on CUDA accelerator, run as:
# USE_CUDA=True ./run_distributed_examples.sh
#
# Script requires uv to be installed. When executed, script will install prerequisites from
# `requirements.txt` for each example. If ran within activated virtual environment (uv venv,
# python -m venv, conda) this might reinstall some of the packages. To change pip installation
# index or to pass additional pip install options, run as:
# PIP_INSTALL_ARGS="--pre -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html" \
# ./run_python_examples.sh
#
# To force script to create virtual environment for each example, run as:
# VIRTUAL_ENV=".venv" ./run_distributed_examples.sh
# Script will remove environments it creates in a teardown step after execution of each example.

BASE_DIR="$(pwd)/$(dirname $0)"
source $BASE_DIR/utils.sh

USE_CUDA=$(python -c "import torch; print(torch.cuda.is_available())")
USE_CUDA=${USE_CUDA:-False}
case $USE_CUDA in
"True")
echo "using cuda"
@@ -30,21 +44,19 @@ case $USE_CUDA in
;;
esac

function distributed() {
start
bash tensor_parallelism/run_example.sh tensor_parallelism/tensor_parallel_example.py || error "tensor parallel example failed"
bash tensor_parallelism/run_example.sh tensor_parallelism/sequence_parallel_example.py || error "sequence parallel example failed"
bash tensor_parallelism/run_example.sh tensor_parallelism/fsdp_tp_example.py || error "2D parallel example failed"
python ddp/main.py || error "ddp example failed"
function distributed_tensor_parallelism() {
uv run bash run_example.sh tensor_parallel_example.py || error "tensor parallel example failed"
uv run bash run_example.sh sequence_parallel_example.py || error "sequence parallel example failed"
uv run bash run_example.sh fsdp_tp_example.py || error "2D parallel example failed"
}

function clean() {
cd $BASE_DIR
echo "running clean to remove cruft"
function distributed_ddp() {
uv run main.py || error "ddp example failed"
}

function run_all() {
distributed
run distributed/tensor_parallelism
run distributed/ddp
}

# by default, run all examples
@@ -54,7 +66,7 @@ else
for i in $(echo $EXAMPLES | sed "s/,/ /g")
do
echo "Starting $i"
$i
run $i
echo "Finished $i, status $?"
done
fi
Loading