Skip to content

Prepare for public release (0.3.0) #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 145 commits into from
Mar 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
145 commits
Select commit Hold shift + click to select a range
cc1ba0d
Update pyproject.toml
apoorvkh Nov 16, 2024
0c7b533
switch to GPL license
apoorvkh Nov 16, 2024
d541933
updates to readme
apoorvkh Nov 17, 2024
693a6b7
placeholders for other libraries
apoorvkh Nov 17, 2024
2aa837d
Update README.md
apoorvkh Nov 22, 2024
cc1546d
Update README.md
apoorvkh Nov 22, 2024
d61578a
bump version to 0.3.0
apoorvkh Nov 22, 2024
ea119c8
updates to readme
apoorvkh Nov 28, 2024
16bff95
switching docs to markdown
apoorvkh Dec 2, 2024
5d30173
added sphinx-autodoc2 as docs req
apoorvkh Dec 2, 2024
e68d126
update license in citation.cff
apoorvkh Dec 2, 2024
cade079
fix autodoc2 import
apoorvkh Dec 2, 2024
4e8c4ca
update autodoc2_packages
apoorvkh Dec 2, 2024
d4377d5
update readthedocs deps
apoorvkh Dec 2, 2024
037b42d
update uv lock
apoorvkh Dec 2, 2024
5144a1a
fixed uv sync command?
apoorvkh Dec 2, 2024
e5bd0e4
update docs structure once more
apoorvkh Dec 2, 2024
f27ac01
fix readthedocs conf.py
apoorvkh Dec 2, 2024
675f05c
refactor again
apoorvkh Dec 2, 2024
b3bdb35
update docs deps
apoorvkh Dec 2, 2024
d28aa46
Rename .readthedocs.yaml to readthedocs.yaml
apoorvkh Dec 2, 2024
1825c07
Rename readthedocs.yaml to .readthedocs.yaml
apoorvkh Dec 2, 2024
e5e05a9
update docs config
apoorvkh Dec 3, 2024
90ad6f6
misc
apoorvkh Dec 3, 2024
2ab0127
Add autodoc2_packages back
apoorvkh Dec 3, 2024
b3befad
Update conf.py
apoorvkh Dec 3, 2024
0e220a4
Update conf.py
apoorvkh Dec 3, 2024
1f48878
Update conf.py
apoorvkh Dec 3, 2024
8183773
fixes for sphinx build
apoorvkh Dec 3, 2024
cf806f4
updates to README
apoorvkh Dec 29, 2024
90a4ce9
switched `func_args` type to `tuple`
apoorvkh Dec 29, 2024
8935e19
added examples page
apoorvkh Jan 24, 2025
40548d4
docs: added HF trainer example; temp disabled github plugin
apoorvkh Jan 24, 2025
dc663b3
enable sphinx github plugin
apoorvkh Jan 24, 2025
83b8d32
sphinx --jobs 1
apoorvkh Jan 25, 2025
6a2ff89
docs: "examples" in sidebar
apoorvkh Jan 25, 2025
ae10612
updated formatting
apoorvkh Jan 26, 2025
726929a
updates to docs
apoorvkh Jan 26, 2025
645745e
docs: build-time copy of README in docs/source
apoorvkh Jan 26, 2025
35ec7d0
docs: migrated readme docs links to https://stable
apoorvkh Jan 26, 2025
8c38bbd
revert link replace
apoorvkh Jan 26, 2025
eaa7be2
update readme
apoorvkh Jan 27, 2025
be485e9
accelerator example
pmcurtin Jan 29, 2025
f72f649
added tensorboard export for accelerate example (in file)
apoorvkh Jan 31, 2025
3e1e887
updated HF Trainer example
apoorvkh Feb 1, 2025
8afb019
examples structure in docs
apoorvkh Feb 2, 2025
504afd8
remove ./acc.py
apoorvkh Feb 2, 2025
592998a
add build-and-publish docs workflow
apoorvkh Feb 2, 2025
1b99890
added CLI arguments
apoorvkh Feb 2, 2025
23327b3
updated transformers training script
apoorvkh Feb 5, 2025
5d7d6fe
ignore these files
Feb 5, 2025
9e897a5
add rough deepspeed example
pmcurtin Feb 5, 2025
857a9b0
typo
pmcurtin Feb 5, 2025
25a9ca6
lightning example
pmcurtin Feb 6, 2025
5016c78
Update lightning.md
pmcurtin Feb 6, 2025
bba941b
new ext module and fix for lightning example
pmcurtin Feb 7, 2025
96f2de5
typing
pmcurtin Feb 7, 2025
d130a8d
checkpointing
Feb 7, 2025
d38f6d8
fix ruff/type checks
pmcurtin Feb 7, 2025
cae668e
typing again
pmcurtin Feb 7, 2025
28ffc04
remove direcotry creation
pmcurtin Feb 7, 2025
45fe57c
actually fixed
pmcurtin Feb 7, 2025
8ad727d
add GROUP_RANK as node rank
pmcurtin Feb 7, 2025
c58cb11
updated examples
apoorvkh Feb 7, 2025
7de581c
Merge branch 'update-docs' of https://github.com/apoorvkh/torchrunx i…
apoorvkh Feb 7, 2025
ad9d446
update docs and deps
apoorvkh Feb 7, 2025
2100e46
rename ext to integrations
apoorvkh Feb 7, 2025
90744e1
rename to test-extras dep group
apoorvkh Feb 7, 2025
0db437b
testing docs build
apoorvkh Feb 7, 2025
70b9074
linting fixes
apoorvkh Feb 7, 2025
913162e
bump deps for docs build
apoorvkh Feb 7, 2025
6e9fab7
update workflows, remove readthedocs settings
apoorvkh Feb 7, 2025
0173aed
fix docs deps
apoorvkh Feb 7, 2025
14d6229
add publish-docs to main PR (temp)
apoorvkh Feb 7, 2025
58fb254
add py.typed
apoorvkh Feb 8, 2025
05ad2bf
removed dependabot
apoorvkh Feb 8, 2025
a3c13df
update citation and contributing
apoorvkh Feb 8, 2025
564a9a5
finished transformers example in docs
apoorvkh Feb 8, 2025
91a0868
format script
apoorvkh Feb 8, 2025
b1896d3
edited lightning integration
apoorvkh Feb 8, 2025
65da00e
changed name for utils.logging (ambiguous)
apoorvkh Feb 8, 2025
cab4268
pin dev deps
apoorvkh Feb 8, 2025
777541d
updated pyproject
apoorvkh Feb 8, 2025
7e03225
refactor accelerate train script
Feb 8, 2025
d5728e9
update readme and tests
apoorvkh Feb 9, 2025
d7dfe7b
Merge branch 'update-docs' of https://github.com/apoorvkh/torchrunx i…
apoorvkh Feb 9, 2025
c76e835
test echo workflow
apoorvkh Feb 9, 2025
4a268e2
bump workflow
apoorvkh Feb 9, 2025
8c6f407
bump workflow again
apoorvkh Feb 9, 2025
27720bf
bump testing workflow
apoorvkh Feb 9, 2025
f2049c5
final update for testing script
apoorvkh Feb 9, 2025
097c268
spacing in (python, pytorch) version
apoorvkh Feb 9, 2025
0a4c7b8
undo
apoorvkh Feb 9, 2025
f5365bd
add uv lock --check
apoorvkh Feb 9, 2025
3db4566
add docs html build
apoorvkh Feb 9, 2025
c2ebf70
not deploying docs
apoorvkh Feb 9, 2025
b9655c7
edit launch result api
apoorvkh Feb 9, 2025
ffc45d1
moving def launch() to top
apoorvkh Feb 9, 2025
c2d51bf
update launcher API
apoorvkh Feb 9, 2025
77e12ef
fix tests
apoorvkh Feb 9, 2025
2331baa
restructure docs features
apoorvkh Feb 9, 2025
0e5ec8a
rename to logging utilities; log to timestamp folder
apoorvkh Feb 9, 2025
26414e6
update slurm env vars
apoorvkh Feb 9, 2025
76e1d72
manually detect number of gpus (workers) per host
apoorvkh Feb 9, 2025
ab8cbd4
switch to sphinx.ext.autodoc
apoorvkh Feb 10, 2025
cb4cd4a
fix ci test: new log structure
pmcurtin Feb 11, 2025
b6ff53e
refactor deepspeed and lightning scripts
Feb 12, 2025
93d1ec1
docs for updated training scripts
pmcurtin Feb 12, 2025
b85c1c7
Update workflows.md
pmcurtin Feb 14, 2025
49b905d
tyro cli help strings
pmcurtin Feb 14, 2025
e87ea25
ensuring rank-order consistency in launcher-agent all_gather
apoorvkh Feb 14, 2025
0c2f131
bump accelerate example
apoorvkh Feb 14, 2025
4c50ed8
Merge branch 'update-docs' of github.com:apoorvkh/torchrunx into upda…
apoorvkh Feb 14, 2025
4fdf9ef
update example script docs
apoorvkh Feb 15, 2025
7fad25f
edit docs
apoorvkh Feb 15, 2025
42fa349
no miliseconds in logging
apoorvkh Feb 15, 2025
23bd11b
added type checking based on function arguments/returns; removed laun…
apoorvkh Feb 15, 2025
4e45f38
fixed launcher.run return type
apoorvkh Feb 15, 2025
2387124
adjust workerargs serialization
apoorvkh Feb 15, 2025
b906bac
update docs for typed Launcher
apoorvkh Feb 16, 2025
2f279a4
update readme
apoorvkh Feb 16, 2025
312be9e
update functools.partial args, kwargs in launcher
apoorvkh Feb 16, 2025
aead8db
updates to env_vars arguments
apoorvkh Feb 16, 2025
3ad386a
small edit readme
apoorvkh Feb 16, 2025
665f188
Merge pull request #86 from apoorvkh/support-type-checking
apoorvkh Feb 16, 2025
120e1e8
scripts dir
apoorvkh Feb 16, 2025
c295819
moved docs artifacts again
apoorvkh Feb 16, 2025
22b7db6
updated how it works
apoorvkh Feb 20, 2025
eb1892a
how it works
apoorvkh Feb 21, 2025
6075b0c
updates to docs; cpu/gpu workers
apoorvkh Feb 22, 2025
9bcb7f2
no propagate_exceptions option
apoorvkh Feb 22, 2025
7bdef69
moved resolution for log handlers
apoorvkh Feb 22, 2025
e3a6278
more updates to docs
apoorvkh Feb 22, 2025
c987259
logging and slurm docs
apoorvkh Feb 23, 2025
5a88452
torchrunx.__version__
apoorvkh Feb 23, 2025
1584c9c
addl CLI parsing
apoorvkh Feb 23, 2025
889f3ef
updated README
apoorvkh Feb 23, 2025
510a881
update readme
apoorvkh Feb 23, 2025
128eaf4
fix agent loggin, basic logging in agent and launcher.
Feb 24, 2025
2b76c49
rm AgentCliArgs
apoorvkh Feb 27, 2025
cd260f5
updates to logging
apoorvkh Feb 28, 2025
41d9759
small adjustments to logging messages
apoorvkh Mar 2, 2025
d4123c8
accept int in log level env var
apoorvkh Mar 3, 2025
ff0b90e
updates to logging hierarchy
apoorvkh Mar 10, 2025
1035b75
final updates to docs
apoorvkh Mar 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 0 additions & 11 deletions .github/dependabot.yml

This file was deleted.

90 changes: 67 additions & 23 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ jobs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v3.2.2
- uses: astral-sh/setup-uv@v5
with:
version: "0.5.0"
python-version-file: ".python-version"
version: "0.5.29"
enable-cache: true
- run: uv lock --check
- run: uv sync
- run: uv run --frozen ruff check
if: success() || failure()
Expand All @@ -26,39 +26,83 @@ jobs:
- run: uv run --frozen pyright
if: success() || failure()

build-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v5
with:
version: "0.5.29"
- run: source ./scripts/build_docs.sh
- uses: actions/upload-artifact@v4
with:
name: docs-html-build
path: docs/_build/html
retention-days: 14

##

get-pytorch-versions:
get-python-pytorch-versions:
runs-on: ubuntu-latest
outputs:
versions: ${{ steps.get-pytorch-versions.outputs.versions }}
versions: ${{ steps.get-versions.outputs.versions }}
steps:
- name: Get PyTorch versions
id: get-pytorch-versions
- name: "Get (Python, PyTorch) versions"
id: get-versions
run: |
VERSIONS=$(
curl -s https://pypi.org/pypi/torch/json | jq -r '.releases | keys[]' |
# remove versions <2.0; strip "patch" from versions
grep -v '^1\.' | grep -E '\.[0]+$' | sort -V | sed 's/\.0$//' |
# to JSON array
jq -R . | jq -sc .
MIN_PYTHON_VERSION=3.9
MIN_PYTORCH_VERSION=2.0

# Get PyTorch versions from PyPI
pytorch_versions=$(
curl -s https://pypi.org/pypi/torch/json | jq -r '.releases | keys[]' |
# strip "patch" from versions
grep -E '\.[0]+$' | sort -V | sed 's/\.0$//'
)
echo "versions=$VERSIONS" >> $GITHUB_OUTPUT
# e.g. ["2.0","2.1","2.2","2.3","2.4"]

# For each PyTorch version, get Python versions that have builds
# Generate JSON list of "python,pytorch" versions

version_matrix=()
for pytorch_version in $pytorch_versions; do
# Skip if PyTorch version less than minium
if [[ "$(printf '%s\n' "$pytorch_version" "$MIN_PYTORCH_VERSION" | sort -V | head -n 1)" != "$MIN_PYTORCH_VERSION" ]]; then continue; fi

python_versions=$(
curl -s "https://pypi.org/pypi/torch/$pytorch_version/json" |
jq -r '.urls[].filename | select(test("manylinux1_x86_64")) | capture("(?<cp>cp[0-9]+)-") | .cp |
sub("cp(?<major>[0-9])(?<minor>[0-9]+)"; "\(.major).\(.minor)")'
)

for python_version in $python_versions; do
# Skip if Python version less than minium
if [[ "$(printf '%s\n' "$python_version" "$MIN_PYTHON_VERSION" | sort -V | head -n 1)" != "$MIN_PYTHON_VERSION" ]]; then continue; fi

version_matrix+=($python_version,$pytorch_version)
done
done
version_matrix=$(printf '%s\n' "${version_matrix[@]}" | jq -R . | jq -s -c .)

# Write to outputs
echo "versions=$version_matrix" >> $GITHUB_OUTPUT

test:
runs-on: ubuntu-latest
needs: get-pytorch-versions
needs: get-python-pytorch-versions
strategy:
fail-fast: false
matrix:
python: ["3.9", "3.10", "3.11", "3.12"]
pytorch: ${{fromJson(needs.get-pytorch-versions.outputs.versions)}}
versions: ${{fromJson(needs.get-python-pytorch-versions.outputs.versions)}}
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v3.2.2
- uses: astral-sh/setup-uv@v5
with:
version: "0.5.0"
- if: contains('2.0,2.1,2.2', matrix.pytorch)
run: echo "NUMPY_VERSION=--with \"numpy<2\"" >> $GITHUB_ENV
- run: uv run --python ${{ matrix.python }} --with torch~=${{ matrix.pytorch }} ${{ env.NUMPY_VERSION }} pytest --verbose tests/test_ci.py
version: "0.5.29"
- run: |
IFS=',' read -r python_version pytorch_version <<< ${{ matrix.versions }}
echo "PYTHON_VERSION=$python_version" >> $GITHUB_ENV
echo "PYTORCH_VERSION=$pytorch_version" >> $GITHUB_ENV
if [[ "$pytorch_version" =~ ^2\.(0|1|2)$ ]]; then
echo "NUMPY_VERSION=--with \"numpy<2\"" >> $GITHUB_ENV
fi
- run: uv run --python ${{ env.PYTHON_VERSION }} --with torch~=${{ env.PYTORCH_VERSION }} ${{ env.NUMPY_VERSION }} pytest --verbose tests/test_ci.py
27 changes: 24 additions & 3 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,35 @@ on:
types: [published]

jobs:
release:
release-to-pypi:
runs-on: ubuntu-latest
permissions:
id-token: write
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v3
- uses: astral-sh/setup-uv@v5
with:
version: "0.5.0"
version: "0.5.29"
- run: uv build
- run: uv publish

publish-docs:
runs-on: ubuntu-latest
permissions:
pages: write
id-token: write
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
steps:
- uses: actions/checkout@v4
- uses: astral-sh/setup-uv@v5
with:
version: "0.5.29"
- run: source ./scripts/build_docs.sh
- uses: actions/configure-pages@v5
- uses: actions/upload-pages-artifact@v3
with:
path: docs/_build/html
- id: deployment
uses: actions/deploy-pages@v4
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
docs/source/README.md
docs/source/contributing.md
docs/source/examples/scripts/

torchrunx_logs/
.pixi/
.ruff_cache/
.vscode/

Expand Down
6 changes: 3 additions & 3 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,6 @@ authors:
family-names: Curtin
email: [email protected]
repository-code: 'https://github.com/apoorvkh/torchrunx'
url: torchrunx.readthedocs.io
license: MIT
year: 2024
url: 'https://torchrun.xyz'
license: GPL-3.0
year: 2025
10 changes: 6 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,18 @@

We use the [`uv`](https://github.com/astral-sh/uv) package manager. Simply [install `uv`](https://github.com/astral-sh/uv#installation) and run `uv sync` in this repository to build the environment. Run `source .venv/bin/activate` to activate the environment.

We use `ruff check` for linting, `ruff format` for formatting, `pyright` for static type checking, and `pytest` for testing.

We build wheels with `uv build` and upload to [PyPI](https://pypi.org/project/torchrunx) with `uv publish`. Our release pipeline is powered by Github Actions.
We use `ruff check` for linting, `ruff format` for formatting, `pyright` for static type checking, and `pytest` for testing. We expect all such checks to pass before merging changes to the main branch. We build wheels with `uv build` and upload to [PyPI](https://pypi.org/project/torchrunx) with `uv publish`. Our CI pipelines are powered by Github Actions.

## Pull Requests

Make a pull request with your changes on Github and we'll try to look at soon! If addressing a specific issue, mention it in the PR, and offer a short explanation of your fix. If adding a new feature, explain why it's meaningful and belongs in __torchrunx__.
Make a pull request with your changes on Github and we'll try to look at it soon! If addressing a specific issue, mention it in the PR, and offer a short explanation of your fix. If adding a new feature, explain why it's meaningful and belongs in **torchrunx**.

## Testing

`tests/` contains `pytest`-style tests for validating that code changes do not break the core functionality of our library.

At the moment, we run `pytest tests/test_ci.py` (i.e. simple single-node CPU-only tests) in our Github Actions CI pipeline (`.github/workflows/release.yml`). One can manually run our more involved tests (on GPUs, on multiple machines from SLURM) on their own hardware.

## Documentation

Our documentation is hosted on Github Pages and is updated with every package release. We build our documentation with [Sphinx](https://www.sphinx-doc.org): `source scripts/build_docs.sh`. The documentation will then be generated at `docs/_build/html` (and can be rendered with `python -m http.server --directory docs/_build/html`).
Loading