You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 2.9.0/final.md
+48-67Lines changed: 48 additions & 67 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,32 +24,23 @@ Below are the full release notes for this release.
24
24
25
25
The minimum version of Python required for PyTorch 2.9.0 is 3.10.
26
26
27
-
## Build Frontend
28
-
29
-
### Remove `/d2implyavx512upperregs` flag that slows build ([#159431](https://github.com/pytorch/pytorch/pull/159431))
30
-
31
-
### Add `ScalarType` to shim conversion and `stable::Tensor.scalar_type` ([#160557](https://github.com/pytorch/pytorch/pull/160557))
32
-
33
-
Before, user extensions could only in abstract pass around obfuscated dtypes appearing as `int32_ts`. Now, users can confidently use `torch::headeronly::ScalarType` in their extensions for major scalar types. This PR enables ABI stability by adding a translation layer through the shim, so that even if the `ScalarType` enum values change in the future, user extensions need not fear.
34
-
35
-
This is narrowly BC breaking for unpopular dtypes: `quint*`s, `qint*`s, `Bits*`, `dummy_uint*`s, `dummy_int*`s, `Float8_e8m0fnu`, and `Float4_e2m1fn_x2` in the use case where an extension retrieves a Tensor dtype of the above and passes it into `aoti_torch_call_dispatcher`.
36
-
37
-
## Export
38
-
### Switch off runtime asserts by default in favor of a shape guards function ([#160111](https://github.com/pytorch/pytorch/pull/160111), [#161178](https://github.com/pytorch/pytorch/pull/161178), [#161794](https://github.com/pytorch/pytorch/pull/161794))
27
+
## Build metal kernels of MacOS-14+ and remove all pre-MacOS-14 specific logic, requires MacOS-14+ going forward ([\#159733](https://github.com/pytorch/pytorch/pull/159733), [\#159912](https://github.com/pytorch/pytorch/pull/159912))
39
28
29
+
PyTorch MPS is only supported on MacOS-14 or later. If you need to use MPS on MacOS Ventura, please avoid updating to Python-3.9 or above
40
30
41
-
To enable runtime asserts, use `export(..., prefer_deferred_runtime_asserts_over_guards=True)`. Also kills the `allow_complex_guards_as_runtime_asserts` flag, merging it into the former option.
31
+
## Upgrade to DLPack 1.0 ([#145000](https://github.com/pytorch/pytorch/pull/145000))
42
32
33
+
This upgrade is doing the same BC-breaking changes as the DLPack release.
34
+
Objects in `torch.utils.dlpack` have been updated to reflect these changes, such as `DLDeviceType`.
35
+
See the PR for details on the exact changes and how to update your code.
43
36
44
-
Additionally, `exported_program.module()` will generate a call to a `_guards_fn` submodule that will run additional checks on inputs. Users who do not want this behavior can either remove this call in the graph, or do `exported_program.module(check_guards=False)` to avoid the generation.
37
+
## Raise appropriate errors in `torch.cat` ([#158249](https://github.com/pytorch/pytorch/pull/158249))
45
38
46
-
## MPS
47
-
### Build metal kernels of MacOS-14+ and remove all pre-MacOS-14 specific logic, requires MacOS-14+ going forward ([\#159733](https://github.com/pytorch/pytorch/pull/159733), [\#159912](https://github.com/pytorch/pytorch/pull/159912))
39
+
Raising `ValueError`, `IndexError` or `TypeError` where appropriate instead of the generic `RuntimeError`.
40
+
If you code was catching these error, you can update to catch the new error type.
48
41
49
-
PyTorch MPS is only supported on MacOS-14 or later. If you need to use MPS on MacOS Ventura, please avoid updating to Python-3.9 or above
50
42
51
-
## ONNX
52
-
### Default to `dynamo=True` for ONNX exporter ([#159646](https://github.com/pytorch/pytorch/pull/159646), [#162726](https://github.com/pytorch/pytorch/pull/162726))
43
+
## Default to `dynamo=True` for ONNX exporter ([#159646](https://github.com/pytorch/pytorch/pull/159646), [#162726](https://github.com/pytorch/pytorch/pull/162726))
53
44
54
45
Previously `torch.onnx.export(...)` used the legacy TorchScript exporter if no arguments were provied. The ONNX exporter now uses the newer `torch.export.export` pipeline by default (`dynamo=True`). This change improves graph fidelity and future-proofs exports, but may surface graph capture errors that were previously masked or handled differently.
55
46
@@ -73,7 +64,15 @@ torch.onnx.export(...)
73
64
Recommendation: first try the new default; only fall back if you hit blocking issues and report them upstream.
74
65
Long term solution: fix the root cause instead of relying on fallback or TorchScript exporter.
75
66
76
-
### Set default opset to 20 ([#158802](https://github.com/pytorch/pytorch/pull/158802))
67
+
## Switch off runtime asserts by default in favor of a shape guards function ([#160111](https://github.com/pytorch/pytorch/pull/160111), [#161178](https://github.com/pytorch/pytorch/pull/161178), [#161794](https://github.com/pytorch/pytorch/pull/161794))
68
+
69
+
70
+
To enable runtime asserts, use `export(..., prefer_deferred_runtime_asserts_over_guards=True)`. Also kills the `allow_complex_guards_as_runtime_asserts` flag, merging it into the former option.
71
+
72
+
73
+
Additionally, `exported_program.module()` will generate a call to a `_guards_fn` submodule that will run additional checks on inputs. Users who do not want this behavior can either remove this call in the graph, or do `exported_program.module(check_guards=False)` to avoid the generation.
74
+
75
+
## Set default opset to 20 ([#158802](https://github.com/pytorch/pytorch/pull/158802))
77
76
78
77
Opset 20 enables newer operator definitions. If your tooling or downstream runtime only supports opset 18, pin it explicitly. For the latest ONNX operators, you can experiment with opset 23.
79
78
@@ -97,7 +96,7 @@ torch.onnx.export(...)
97
96
torch.onnx.export(..., opset_version=23)
98
97
```
99
98
100
-
###Drop `draft_export` in exporter API ([#161454](https://github.com/pytorch/pytorch/pull/161454), [#162225](https://github.com/pytorch/pytorch/pull/162225))
99
+
## Drop `draft_export` in exporter API ([#161454](https://github.com/pytorch/pytorch/pull/161454), [#162225](https://github.com/pytorch/pytorch/pull/162225))
101
100
102
101
Remove implicit draft tracing from the default exporter path, achieving clearer behaviour and faster failures.
103
102
The expensive `torch.export.draft_export` diagnostic path is no longer auto-invoked (which could take hours on large models). You can still opt in for deep diagnostics:
###Remove `torch.onnx.dynamo_export` and the `onnxrt` torch compile backend ([#158130](https://github.com/pytorch/pytorch/pull/158130), [#158258](https://github.com/pytorch/pytorch/pull/158258))
127
+
## Remove `torch.onnx.dynamo_export` and the `onnxrt` torch compile backend ([#158130](https://github.com/pytorch/pytorch/pull/158130), [#158258](https://github.com/pytorch/pytorch/pull/158258))
129
128
130
129
`torch.onnx.dynamo_export` is removed. Please use `torch.onnx.export` instead.
131
130
The experimental ONNX Runtime compile backend (`torch.compile(backend="onnxrt")`) is no longer supported.
## Some public facing utility APIs for the TorchScript based exporter are now private ([#161323](https://github.com/pytorch/pytorch/pull/161323))
139
137
140
-
## Python Frontend
141
-
### Upgrade to DLPack 1.0. ([#145000](https://github.com/pytorch/pytorch/pull/145000))
138
+
Deprecated members in `torch.onnx.verification` are removed. Previously private `torch.onnx.symbolic_opsets*` functions will no longer be accessible. Consider making a copy of the source code if you need to access any private functions for compatibility with the TorchScript based exporter.
142
139
143
-
This upgrade is doing the same BC-breaking changes as the DLPack release.
144
-
Objects in `torch.utils.dlpack` have been updated to reflect these changes, such as `DLDeviceType`.
145
-
See the PR for details on the exact changes and how to update your code.
### Raise appropriate errors in `torch.cat` ([#158249](https://github.com/pytorch/pytorch/pull/158249))
142
+
Support for `caffe2`in the ONNX exporter has ended and is removed.
148
143
149
-
Raising `ValueError`, `IndexError` or `TypeError` where appropriate instead of the generic `RuntimeError`.
150
-
If you code was catching these error, you can update to catch the new error type.
144
+
## Remove `/d2implyavx512upperregs` flag that slows build ([#159431](https://github.com/pytorch/pytorch/pull/159431))
151
145
152
-
# Deprecations
153
-
## Dataloader Frontend
154
-
### Deprecate `pin_memory_device` param in `torch.utils.data.DataLoader` ([#158323](https://github.com/pytorch/pytorch/pull/158323))
146
+
Re-introduced AVX512 optimizations for Windows VS2022 builds, may cause issues with specific versions of VS2022, see [#145702](https://github.com/pytorch/pytorch/issues/145702)
155
147
156
-
We move enabling `pin_memory` back inside `BaseDataLoaderIter`. This is required for `StatefulDataloader` which leveraged `BaseDataLoaderIter` direclty rather than the `Dataloader` class init
148
+
## Add `ScalarType` to shim conversion and `stable::Tensor.scalar_type` ([#160557](https://github.com/pytorch/pytorch/pull/160557))
157
149
158
-
## Export
159
-
### Deprecation for `export_for_training` API, in favor of equivalent `export` API ([#158203](https://github.com/pytorch/pytorch/pull/158203))
150
+
Before, user extensions could only in abstract pass around obfuscated dtypes appearing as `int32_ts`. Now, users can confidently use `torch::headeronly::ScalarType` in their extensions for major scalar types. This PR enables ABI stability by adding a translation layer through the shim, so that even if the `ScalarType` enum values change in the future, user extensions need not fear.
160
151
161
-
`export_for_training` exists because we couldn't migrate internal usages of export to the final IR. Now that we have completed the migration, we deprecated and deleted this API.
152
+
This change adds ScalarType support for user extensions and is only narrowly BC breaking for unpopular dtypes: `quint*`s, `qint*`s, `Bits*`, `dummy_uint*`s, `dummy_int*`s, `Float8_e8m0fnu`, and `Float4_e2m1fn_x2` in the use case where an extension retrieves a Tensor dtype of the above and passes it into `aoti_torch_call_dispatcher`.
162
153
163
-
## Release Engineering
164
-
### Remove Python 3.9 support in CD builds. Move CI to Python 3.10.([#161427](https://github.com/pytorch/pytorch/pull/161427)) ([#162265](https://github.com/pytorch/pytorch/pull/162265)) ([#162297](https://github.com/pytorch/pytorch/pull/162297)) ([#160852](https://github.com/pytorch/pytorch/pull/160852))
154
+
#Deprecations
155
+
##Deprecate `pin_memory_device` param in `torch.utils.data.DataLoader` ([#158323](https://github.com/pytorch/pytorch/pull/158323))
165
156
166
-
### Remove CUDA 12.9 support in CD builds ([#161916](https://github.com/pytorch/pytorch/pull/161916))
157
+
We move enabling `pin_memory` back inside `BaseDataLoaderIter`. This is required for `StatefulDataloader` which leveraged `BaseDataLoaderIter` direclty rather than the `Dataloader` class init
158
+
159
+
## Deprecate `torch.export.export_for_training` API in favor of equivalent `torch.export.export` API ([#158203](https://github.com/pytorch/pytorch/pull/158203))
160
+
161
+
`torch.export.export_for_training` exists because we couldn't migrate internal usages of export to the final IR. Now that we have completed the migration, we deprecated and deleted this API.
167
162
168
163
# New Features
169
164
## AOTDispatcher
@@ -174,29 +169,12 @@ We move enabling `pin_memory` back inside `BaseDataLoaderIter`. This is required
174
169
- Add `zero_()` and `empty_like(t)` to `torch/csrc/stable/ops.h` ([#158866](https://github.com/pytorch/pytorch/pull/158866))
175
170
176
171
## C++ Extensions
177
-
- Add pad and narrow to `torch/csrc/stable/ops.h` ([#159328](https://github.com/pytorch/pytorch/pull/159328))
178
-
- Add `getCurrentDeviceIndex` to `torch::stable::accelerator` ([#160453](https://github.com/pytorch/pytorch/pull/160453))
179
-
- Add `new_zeros` dtype variant to the shim and as a stable op ([#161597](https://github.com/pytorch/pytorch/pull/161597))
- Add beginnings of `torch::stable::accelerator` ([#159679](https://github.com/pytorch/pytorch/pull/159679))
182
-
- Port `amax` to stable ABI ([#160214](https://github.com/pytorch/pytorch/pull/160214))
183
-
- Add `new_empty` (with dtype argument only) to `torch::stable` ([#159508](https://github.com/pytorch/pytorch/pull/159508))
184
-
- Enable generating generic `c_shim` that doesn't bypass dispatcher ([#158974](https://github.com/pytorch/pytorch/pull/158974))
185
-
- Cut a version of `TORCH_ERROR_CODE_CHECK` in `headeronly` from AOTI ([#159604](https://github.com/pytorch/pytorch/pull/159604))
186
-
- Check F2C BLAS for OpenBLAS and other vendors ([#143846](https://github.com/pytorch/pytorch/pull/143846))
187
-
- Add an ovrsource target for `torch/headeronly` ([#157912](https://github.com/pytorch/pytorch/pull/157912))
188
-
- Migrate `c10/macros/cmake_macros.h.in` to `torch/headeronly` ([#158035](https://github.com/pytorch/pytorch/pull/158035))
189
-
- Move `c10/macros/Macros.h` to `headeronly` ([#158365](https://github.com/pytorch/pytorch/pull/158365))
190
-
- Add `STD_TORCH_CHECK` to `headeronly` ([#158377](https://github.com/pytorch/pytorch/pull/158377))
191
-
- Migrate easy q(u)int/bits stuff to `torch/headeronly` ([#159302](https://github.com/pytorch/pytorch/pull/159302))
192
-
- Move `Float4` to `headeronly` ([#159414](https://github.com/pytorch/pytorch/pull/159414))
193
-
- Move `BFloat16.h` to `headeronly` ([#159412](https://github.com/pytorch/pytorch/pull/159412))
194
-
- Move `Float8` variations to `headeronly` ([#159415](https://github.com/pytorch/pytorch/pull/159415))
195
-
- Move complex to `headeronly` ([#159411](https://github.com/pytorch/pytorch/pull/159411))
196
-
- Migrate `ScalarType` to `headeronly` ([#159911](https://github.com/pytorch/pytorch/pull/159911))
197
-
- Add stable Tensor `get_device_index`, use more stable `DeviceIndex` ([#160143](https://github.com/pytorch/pytorch/pull/160143))
198
-
- Add `is_cpu` method to stable tensor type ([#160212](https://github.com/pytorch/pytorch/pull/160212))
172
+
- Build out a stable set of ATen ops in `torch/csrc/stable/ops.h`: `amax`, `narrow`, `new_empty` + `new_zeros` dtype variant, `pad`, ([#159328](https://github.com/pytorch/pytorch/pull/159328), [#158974](https://github.com/pytorch/pytorch/pull/158974), [#159508](https://github.com/pytorch/pytorch/pull/159508), [#161597](https://github.com/pytorch/pytorch/pull/161597), [#160214](https://github.com/pytorch/pytorch/pull/160214), )
173
+
- Add `torch::stable::Tensor()` default constructor, `is_cpu`, and `get_device_index`([#159507](https://github.com/pytorch/pytorch/pull/159507), [#160212](https://github.com/pytorch/pytorch/pull/160212), [#160143](https://github.com/pytorch/pytorch/pull/160143))
174
+
- Add beginnings of `torch::stable::accelerator` with support for DeviceGuard and Stream ([#159679](https://github.com/pytorch/pytorch/pull/159679), [#160453](https://github.com/pytorch/pytorch/pull/160453))
175
+
- Start building out `torch/headeronly`: c10 Macros, STD_TORCH_CHECK, ScalarTypes (like BFloat16 and Half) ([#158035](https://github.com/pytorch/pytorch/pull/158035), [#158365](https://github.com/pytorch/pytorch/pull/158365), [#157912](https://github.com/pytorch/pytorch/pull/157912), [#158377](https://github.com/pytorch/pytorch/pull/158377), [#159302](https://github.com/pytorch/pytorch/pull/159302), [#159414](https://github.com/pytorch/pytorch/pull/159414), [#159412](https://github.com/pytorch/pytorch/pull/159412), [#159415](https://github.com/pytorch/pytorch/pull/159415), [#159411](https://github.com/pytorch/pytorch/pull/159411), [#159911](https://github.com/pytorch/pytorch/pull/159911))
199
176
- Remove cmake cache and reconfigure again if it is invalid ([#156958](https://github.com/pytorch/pytorch/pull/156958))
177
+
- Cut a version of `TORCH_ERROR_CODE_CHECK` in `headeronly` from AOTI ([#159604](https://github.com/pytorch/pytorch/pull/159604))
200
178
- Remove `wheel` from build requirements ([#158027](https://github.com/pytorch/pytorch/pull/158027))
201
179
- Error when `TORCH_STABLE_ONLY` is defined in `TensorBase.h` ([#161658](https://github.com/pytorch/pytorch/pull/161658))
202
180
@@ -248,8 +226,7 @@ We move enabling `pin_memory` back inside `BaseDataLoaderIter`. This is required
248
226
- Add `torch.hash_tensor` reduction function ([#154149](https://github.com/pytorch/pytorch/pull/154149))
249
227
250
228
## Quantization
251
-
- Enable cpu fp8 qlinear ([#155678](https://github.com/pytorch/pytorch/pull/155678))
252
-
- Enable cpu fp8 qconv ([#157076](https://github.com/pytorch/pytorch/pull/157076))
229
+
- Enable cpu fp8 qlinear and cpu fp8 qconv ([#155678](https://github.com/pytorch/pytorch/pull/155678), [#157076](https://github.com/pytorch/pytorch/pull/157076))
253
230
254
231
## Release Engineering
255
232
- Add support for CUDA 13.0 in CI/CD builds. Enable CUDA compression mode for binary size reduction for CUDA 13.0 builds ([#160956](https://github.com/pytorch/pytorch/pull/160956)) ([#161073](https://github.com/pytorch/pytorch/pull/161073)) ([#161257](https://github.com/pytorch/pytorch/pull/161257)) ([#161663](https://github.com/pytorch/pytorch/pull/161663)) ([#161316](https://github.com/pytorch/pytorch/pull/161316)) ([#160201](https://github.com/pytorch/pytorch/pull/160201)) ([#160770](https://github.com/pytorch/pytorch/pull/160770)) ([#161013](https://github.com/pytorch/pytorch/pull/161013)) ([#161916](https://github.com/pytorch/pytorch/pull/161916)) ([#162268](https://github.com/pytorch/pytorch/pull/162268)) ([#162322](https://github.com/pytorch/pytorch/pull/162322)) ([#162383](https://github.com/pytorch/pytorch/pull/162383)) ([#161833](https://github.com/pytorch/pytorch/pull/161833))
@@ -283,6 +260,8 @@ We move enabling `pin_memory` back inside `BaseDataLoaderIter`. This is required
283
260
- Fix dev warning in `Dependencies.cmake` ([#159702](https://github.com/pytorch/pytorch/pull/159702))
284
261
- Fix building system gloo with CUDA/HIP ([#146637](https://github.com/pytorch/pytorch/pull/146637))
285
262
- Build `libtorch` without NVSHMEM ([#160910](https://github.com/pytorch/pytorch/pull/160910))
- Meta implementation for `aten.add.Scalar` ([#161332](https://github.com/pytorch/pytorch/pull/161332))
@@ -498,13 +477,15 @@ We move enabling `pin_memory` back inside `BaseDataLoaderIter`. This is required
498
477
- Implement workaround for `cudaErrorNotSupported` ([#162412](https://github.com/pytorch/pytorch/pull/162412))
499
478
- Fix missing `__syncthreads` in MultiMarginLoss backward ([#158994](https://github.com/pytorch/pytorch/pull/158994))
500
479
- Roll-back cuDNN frontend upgrade and update Meta registration due to compile issues ([#163104](https://github.com/pytorch/pytorch/pull/163104))
480
+
- Disable cuDNN for 3D convolutions with `kernel size != 1` for cuDNN 9.8+ ([#163581](https://github.com/pytorch/pytorch/pull/163581))
501
481
502
482
## Distributed
503
483
### c10d
504
484
- Fix slow init due to repeated dns resolution failure in socket ([#159596](https://github.com/pytorch/pytorch/pull/159596))
505
485
- Fix `setGroupName` and `setGroupDesc` in `group_split` and `merge_remote_group` ([#159429](https://github.com/pytorch/pytorch/pull/159429))
506
486
- Fix a bug of distributed 'gather' with noncontiguous tensors on the Gloo backend ([#158903](https://github.com/pytorch/pytorch/pull/158903))
507
487
- Fix a bug of distributed 'gather' with noncontiguous tensors on the NCCL backend ([#159549](https://github.com/pytorch/pytorch/pull/159549))
488
+
- Fix data inconsistencies when using `batch_isend_irecv` with 2D tensor views by making P2P tensors dense ([#163719](https://github.com/pytorch/pytorch/pull/163719))
508
489
### Device Mesh
509
490
- Fix the not incorrectly chained each of the strings as iterables ([#160709](https://github.com/pytorch/pytorch/pull/160709))
0 commit comments