Expose`weights_only` for loading checkpoints with `Trainer`, `LightningModule`, `LightningDataModule` #21072

matsumotosan · 2025-08-14T19:05:11Z

What does this PR do?

Fixes #20450 #20058 #20643

Before submitting

Was this discussed/agreed via a GitHub issue? (not for typos and docs)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes? (if necessary)
Did you write any new necessary tests? (not for typos and docs)
Did you verify new and existing tests pass locally with your changes?
Did you list all the breaking changes introduced by this pull request?
Did you update the CHANGELOG? (not for typos, docs, test updates, or minor internal changes/refactors)

PR review

Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:

Reviewer checklist

Is this pull request ready for review? (if not, please submit in draft mode)
Check that all items from Before submitting are resolved
Make sure the title is self-explanatory and the description concisely explains the PR
Add labels and milestones (and optionally projects) to the PR so it can be classified

📚 Documentation preview 📚: https://pytorch-lightning--21072.org.readthedocs.build/en/21072/

codecov · 2025-08-15T08:25:23Z

Codecov Report

❌ Patch coverage is 55.55556% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 49%. Comparing base (634e6e6) to head (9aadbef).

❌ Your project check has failed because the head coverage (49%) is below the target coverage (99%). You can increase the head coverage or adjust the target coverage.

❗ There is a different number of reports uploaded between BASE (634e6e6) and HEAD (9aadbef). Click for more details.

HEAD has 70 uploads less than BASE

Flag BASE (634e6e6) HEAD (9aadbef)

cpu 41 27

lightning 24 16

python3.11 10 6

python 5 3

python3.10 10 6

python3.12.7 13 9

pytorch2.1 4 0

pytest-full 14 0

pytorch2.2.2 1 0

pytorch_lightning 6 0

pytorch2.7 2 0

pytorch2.3 3 0

pytorch2.6 2 0

pytorch2.5.1 1 0

pytorch2.4.1 1 0

Additional details and impacted files

@@            Coverage Diff            @@
##           master   #21072     +/-   ##
=========================================
- Coverage      87%      49%    -39%     
=========================================
  Files         269      266      -3     
  Lines       23520    23468     -52     
=========================================
- Hits        20503    11395   -9108     
- Misses       3017    12073   +9056

… based on ckpt version

matsumotosan · 2025-08-16T03:27:13Z

src/lightning/pytorch/core/saving.py

@@ -56,11 +56,17 @@ def _load_from_checkpoint(
    map_location: _MAP_LOCATION_TYPE = None,
    hparams_file: Optional[_PATH] = None,
    strict: Optional[bool] = None,
+    weights_only: Optional[bool] = None,


Should we default to weights_only=None or weights_only=True? If we have no use for weights_only=None, we can simplify the type hint to weights_only: bool = True.

matsumotosan · 2025-08-16T03:31:13Z

tests/tests_pytorch/checkpointing/test_legacy_checkpoints.py

@@ -45,7 +46,12 @@ def test_load_legacy_checkpoints(tmp_path, pl_version: str):
        assert path_ckpts, f'No checkpoints found in folder "{PATH_LEGACY}"'
        path_ckpt = path_ckpts[-1]

-        model = ClassificationModel.load_from_checkpoint(path_ckpt, num_features=24)
+        # legacy load utility added in 1.5.0 (see https://github.com/Lightning-AI/pytorch-lightning/pull/9166)
+        if pl_version == "local":


This is the simplest way that I could think of ensuring we continue testing the legacy checkpoints. Another way could be to use torch.serialization.add_safe_globals, but it seems a little more complicated (particularly since we're using the pl_legacy_patch context manager already.

matsumotosan · 2025-08-16T17:30:14Z

@Borda I wanted to get your opinion on something before moving forward.

I've added weights_only as an argument to LightningModule.load_from_checkpoint and all downstream functions to allow users to determine which option they want to use to load checkpoints.

My issue right now is with resuming training from a checkpoint with Trainer.fit. I see a few options right now:

Add weights_only as an argument to Trainer.fit (would also have to modify args for validate, test, and predict). Set default value to True.
Use weights_only=True everywhere, and print an error message advising user to set TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD if they would like to load with weights_only=False. Users must explicitly set environment variable to force loading with weights_only=False.
Add weights_only as an argument to Trainer initialization. Easy, but would not allow fine-grained control on loading models between different calls of fit, validate, etc.

I'm leaning towards option 1, but it involves changing up Trainer methods, which affects a lot of code so wanted to run this by you beforehand.

Borda · 2025-08-18T06:56:36Z

My issue right now is with resuming training from a checkpoint with Trainer.fit. I see a few options right now:

Add weights_only as an argument to Trainer.fit (would also have to modify args for validate, test, and predict). Set default value to True.

Use weights_only=True everywhere, and print an error message advising user to set TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD if they would like to load with weights_only=False. Users must explicitly set environment variable to force loading with weights_only=False.

Add weights_only as an argument to Trainer initialization. Easy, but would not allow fine-grained control on loading models between different calls of fit, validate, etc.

The cleanest way would probably be 1), but it brings so many new arguments for a marginal use... so personally I would go with 2)
cc: @lantiga

lantiga · 2025-08-19T11:13:44Z

Hi @matsumotosan let's do that only if the underlying torch is >= 2.6 (since starting weights_only became True by default from that point on), otherwise we're going to break a lot of older code

…AI#21077)

* try `deepspeed >=0.14.1,<=0.15.0` * drop from oldest * pip uninstall -y deepspeed * error::DeprecationWarning

matsumotosan · 2025-08-21T01:10:05Z

I am not sure if it's possible to default to weights_only=True for trainer.{test,validate,test,predict}, since it loads a checkpoint at some point and that checkpoint may include elements that are not allowed by torch.load(..., weights_only=True) (Lightning itself saves classes like ModelCheckpoint), which cannot be loaded unless weights_only=False or a context manager is used.

The big issue with context managers is that a different one has to be used each time a different checkpoint is loaded. Setting the environment variable TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1 does not work as it is overriden by the value passed to torch.load.

With this in mind, I think passing weights_only to trainer.{fit,validate,test,predict} may be the only short-term solution.

If we need to force the weights_only=True solution, every item saved in the checkpoint would need to be converted into a primitive type that torch.load(.., weights_only=True) would accept.

I have also added weights_only as an argument to the LightningModule and LightningDataModule classes as there are a few issues that point this out as a source of error:

Maybe we could default add weights_only and have it default to False for now until a future release so that we don't break users' code whilst adding an explicit option for this?

src/lightning/fabric/utilities/imports.py

Borda · 2025-08-27T12:42:24Z

src/lightning/fabric/utilities/cloud_io.py

@@ -51,6 +56,9 @@ def _load(
            weights_only=weights_only,
        )
    if str(path_or_url).startswith("http"):
+        if weights_only is None:


Suggested change

if weights_only is None:

if weights_only is None and _TORCH_GREATER_EQUAL_2_6:

@Borda

torch.load_state_dict_from_url defaults to weights_only=False even for 2.8. Should we still default to weights_only=True in this case?

For cases of torch.load, I am not modifying weights_only if the function receives weights_only=None as the underlying torch.load function will handle defaults. We cannot do that with torch.load_state_dict_from_url as the argument is weights_only: bool = False.

Co-authored-by: Jirka Borovec <[email protected]>

matsumotosan added 2 commits August 14, 2025 14:22

change weights_only default to True

074b01e

add docs on weights_only arg

65cc1ed

matsumotosan requested review from lantiga, Borda, tchaton, justusschock and ethanwharris as code owners August 14, 2025 19:05

github-actions bot added the fabric lightning.fabric.Fabric label Aug 14, 2025

Merge branch 'master' into weights-only-compatibility

4eaaf58

matsumotosan marked this pull request as draft August 15, 2025 18:21

add weights_only arg to checkpoint save. weights_only during test set…

f276114

… based on ckpt version

github-actions bot added the pl Generic label for PyTorch Lightning package label Aug 15, 2025

Merge branch 'master' into weights-only-compatibility

601e300

matsumotosan force-pushed the weights-only-compatibility branch from d7cb702 to 601e300 Compare August 15, 2025 22:20

matsumotosan added 4 commits August 15, 2025 20:35

add weights_only arg to checkpoint_io

28f53ae

woops, reverting changes

b1cfdf1

permissions too

4d96a78

fix link

4c39c30

matsumotosan commented Aug 16, 2025

View reviewed changes

matsumotosan marked this pull request as ready for review August 16, 2025 15:37

fix another link

861d7e0

matsumotosan changed the title ~~Compatibility for weights_only=True by default~~ Compatibility for weights_only=True by default for loading weights Aug 16, 2025

matsumotosan added 4 commits August 17, 2025 18:46

datamodule weights_only args

12bd0d6

wip: try safe_globals context manager for tests

5eacb6e

add weights_only arg to _run_standard_hparams_test

0430e22

weights_only=False when adding extra_args

2abe915

Merge branch 'master' into weights-only-compatibility

525d9a8

Borda and others added 3 commits August 19, 2025 10:17

switch to lightning_utilities.cli requirements set-oldest (Lightning-…

83fd824

…AI#21077)

bump: try deepspeed >=0.14.1,<=0.15.0 (Lightning-AI#21076)

93cbe94

* try `deepspeed >=0.14.1,<=0.15.0` * drop from oldest * pip uninstall -y deepspeed * error::DeprecationWarning

weights_only=True default for torch>=2.6

2a53f2f

github-actions bot added ci Continuous Integration dependencies Pull requests that update a dependency file dockers package labels Aug 19, 2025

matsumotosan added 9 commits August 19, 2025 11:22

Merge branch 'master' into weights-only-compatibility

3833892

Merge branch 'Lightning-AI:master' into weights-only-compatibility

9d8997e

changelog

561c02c

ignore torch.load futurewarning

c67c8a3

add .*

005c439

will this woork

2ab89a2

weights_only according pl version

74e5e5a

set env var TORCH_FORCE_NO_WEIGHTS_ONLY_LOAD=1 for pl < 1.5.0

a4c9efe

weights_only=False for omegaconf hparams test

2c2ab9e

matsumotosan added 2 commits August 21, 2025 00:48

default to weights_only=true for loading from state_dict from url

54b859a

weights_only=False for hydra

7ddb4f8

matsumotosan changed the title ~~Compatibility for weights_only=True by default for loading weights~~ Exposeweights_only for loading checkpoints with Trainer, LightningModule, LightningDataModule Aug 21, 2025

Merge branch 'master' into weights-only-compatibility

7d6174a

Borda reviewed Aug 27, 2025

View reviewed changes

src/lightning/fabric/utilities/imports.py Show resolved Hide resolved

Borda reviewed Aug 27, 2025

View reviewed changes

matsumotosan and others added 2 commits August 27, 2025 16:49

Update src/lightning/fabric/utilities/cloud_io.py

906e52e

Co-authored-by: Jirka Borovec <[email protected]>

Merge branch 'Lightning-AI:master' into weights-only-compatibility

377cf11

github-actions bot added the has conflicts label Aug 29, 2025

Merge branch 'master' into weights-only-compatibility

9aadbef

github-actions bot removed the has conflicts label Aug 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Expose`weights_only` for loading checkpoints with `Trainer`, `LightningModule`, `LightningDataModule` #21072

Expose`weights_only` for loading checkpoints with `Trainer`, `LightningModule`, `LightningDataModule` #21072

Uh oh!

matsumotosan commented Aug 14, 2025 •

edited

Loading

Uh oh!

codecov bot commented Aug 15, 2025 •

edited

Loading

Uh oh!

matsumotosan Aug 16, 2025

Uh oh!

matsumotosan Aug 16, 2025

Uh oh!

matsumotosan commented Aug 16, 2025

Uh oh!

Borda commented Aug 18, 2025

Uh oh!

lantiga commented Aug 19, 2025

Uh oh!

matsumotosan commented Aug 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Borda Aug 27, 2025

Uh oh!

matsumotosan Aug 27, 2025

Uh oh!

Uh oh!

	if weights_only is None:
	if weights_only is None and _TORCH_GREATER_EQUAL_2_6:

Exposeweights_only for loading checkpoints with Trainer, LightningModule, LightningDataModule #21072

Are you sure you want to change the base?

Exposeweights_only for loading checkpoints with Trainer, LightningModule, LightningDataModule #21072

Uh oh!

Conversation

matsumotosan commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

PR review

Uh oh!

codecov bot commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

matsumotosan Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

matsumotosan Aug 16, 2025

Choose a reason for hiding this comment

Uh oh!

matsumotosan commented Aug 16, 2025

Uh oh!

Borda commented Aug 18, 2025

Uh oh!

lantiga commented Aug 19, 2025

Uh oh!

matsumotosan commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Borda Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

matsumotosan Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Expose`weights_only` for loading checkpoints with `Trainer`, `LightningModule`, `LightningDataModule` #21072

Expose`weights_only` for loading checkpoints with `Trainer`, `LightningModule`, `LightningDataModule` #21072

matsumotosan commented Aug 14, 2025 •

edited

Loading

codecov bot commented Aug 15, 2025 •

edited

Loading

matsumotosan commented Aug 21, 2025 •

edited

Loading