-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while merging hparams
when using LightningCLI and YAML
#20182
Comments
FYI: I worked around it now by adding the following to my def setup(self, stage: str):
self._save_config()
def _save_config(self):
# P: Currently, we do not log all hyperparameters:
# lightning.CLI saves the config.yaml only locally, not well to W&B.
# self.save_hyperparameters() does not work well with YAML, see
# https://github.com/Lightning-AI/pytorch-lightning/issues/20182
# S: Read saved local config and save it as hparams.
# You should disable any other `self.save_hyperparameters()`.
# NOTE: This logs the RESOLVED config using YAML and CLI arguments.
import torch.distributed as dist
if dist.is_initialized() and dist.get_rank() != 0:
# only save config with rank0
return
if self.trainer.fast_dev_run:
# in fast_dev_run mode, loggers are replaced by DummyLogger
return
config_yaml_path = Path(self.logger.save_dir) / "config.yaml"
assert config_yaml_path.exists()
with open(config_yaml_path) as f:
dct = yaml.safe_load(f)
self.save_hyperparameters(dct)
print("=== Saved config.yaml as hyperparameter") |
I was not aware of this merging of parameters. I guess this happens always when both model and data do |
Ran into same issue. Simple temporary workaround: self.save_hyperparameters(ignore=['_class_path']) similar issues: #9492 |
I created pull request #20221 to fix this. |
Bug description
The minimal example below throws the error
RuntimeError: Error while merging hparams: the keys ['_class_path'] are present in both the LightningModule's and LightningDataModule's hparams but have different values.
I though this was supposed to work. Would really appreciate workaround tips (that also work with checkpointing) or a fix.
What version are you seeing the problem on?
v2.4
How to reproduce the bug
Run the code below using e.g.
python main.py --config config.yaml
Error messages and logs
Environment
Current environment
- GPU:
- NVIDIA A10G
- available: True
- version: 12.1
- lightning: 2.4.0
- lightning-utilities: 0.11.4
- pytorch-lightning: 2.3.3
- torch: 2.3.1
- torchdata: 0.7.1
- torchmetrics: 1.0.3
- torchsummary: 1.5.1
- torchvision: 0.18.1
- absl-py: 2.1.0
- affine: 2.4.0
- aiobotocore: 2.13.0
- aiohttp: 3.9.5
- aioitertools: 0.11.0
- aiosignal: 1.3.1
- albucore: 0.0.12
- albumentations: 1.4.11
- altair: 5.3.0
- annotated-types: 0.7.0
- ansi2html: 1.9.1
- ansicolors: 1.1.8
- antlr4-python3-runtime: 4.9.3
- anyio: 4.4.0
- appdirs: 1.4.4
- argon2-cffi: 23.1.0
- argon2-cffi-bindings: 21.2.0
- arrow: 1.3.0
- asciitree: 0.3.3
- assertpy: 1.1
- asttokens: 2.4.1
- astunparse: 1.6.3
- async-lru: 2.0.4
- attrs: 23.2.0
- automation-api-gateway-client: 12.0.165
- az-annotation-io: 0.18.8
- az-cp-aws-utils: 1.2.5
- az-cp-datadictionary-definitions: 0.2024.119.post1
- az-cp-drclib: 0.1.0
- az-cp-holoviews-compressed-rgb: 0.0.1rc3
- az-cp-imagekit-ventana-bif: 0.2.2
- az-cp-logging: 0.15.0
- az-cp-ooportal: 1.4.7
- az-cp-pathviz: 0.0.1rc7
- az-cp-pita: 1.10.1
- az-cp-predictino-container: 3.10.3
- az-drc2polygons: 2.2.2
- az-git-utils: 0.14.0
- babel: 2.15.0
- beautifulsoup4: 4.12.3
- bleach: 6.1.0
- blessed: 1.20.0
- blinker: 1.8.2
- bokeh: 3.4.1
- boto3: 1.34.106
- botocore: 1.34.106
- bpython: 0.24
- braceexpand: 0.1.7
- cachetools: 5.3.3
- cerberus: 1.3.5
- certifi: 2024.2.2
- cffi: 1.16.0
- cfgv: 3.4.0
- charset-normalizer: 3.3.2
- click: 8.1.7
- click-plugins: 1.1.1
- cligj: 0.7.2
- cloudpickle: 3.0.0
- cmake: 3.25.0
- colorama: 0.4.6
- colorcet: 3.1.0
- comm: 0.2.2
- contourpy: 1.2.1
- coreg-tools: 0.3.0
- cubinlinker-cu11: 0.3.0.post2
- cuda-python: 11.8.3
- cudf-cu11: 24.6.1
- cuml-cu11: 24.6.1
- cupy-cuda11x: 13.2.0
- curtsies: 0.4.2
- cwcwidth: 0.1.9
- cycler: 0.12.1
- dash: 2.17.0
- dash-core-components: 2.0.0
- dash-html-components: 2.0.0
- dash-table: 5.0.0
- dask: 2024.5.1
- dask-cuda: 24.6.0
- dask-cudf-cu11: 24.6.1
- dask-expr: 1.1.1
- datashader: 0.16.2
- debugpy: 1.8.1
- decorator: 5.1.1
- definiens-autocli: 4.5.3
- definiens-ia-algorithms: 3.39.1
- definiens-imagekit: 5.6.6
- definiens-parallel: 1.3.0
- defusedxml: 0.7.1
- deprecated: 1.2.14
- distlib: 0.3.8
- distributed: 2024.5.1
- distributed-ucxx-cu11: 0.38.0
- dm-tree: 0.1.8
- dnspython: 2.6.1
- docker-pycreds: 0.4.0
- docstring-parser: 0.16
- ec2-metadata: 2.13.0
- editor: 1.6.6
- entrypoints: 0.4
- eval-type-backport: 0.2.0
- executing: 2.0.1
- fancycompleter: 0.9.1
- fasteners: 0.19
- fastjsonschema: 2.19.1
- fastrlock: 0.8.2
- filelock: 3.14.0
- fiona: 1.9.6
- flask: 3.0.3
- flatbuffers: 24.3.25
- fonttools: 4.52.4
- fqdn: 1.5.1
- frozendict: 2.4.4
- frozenlist: 1.4.1
- fsspec: 2024.5.0
- fvcore: 0.1.5.post20221221
- gast: 0.6.0
- geopandas: 1.0.1
- geopolars: 0.1.0a4
- gitdb: 4.0.11
- gitpython: 3.1.43
- google-auth: 2.32.0
- google-auth-oauthlib: 1.0.0
- google-pasta: 0.2.0
- greenlet: 3.0.3
- grpcio: 1.64.1
- h11: 0.14.0
- h5py: 3.11.0
- holoviews: 1.19.0
- httpcore: 1.0.5
- httpx: 0.27.0
- huggingface-hub: 0.24.0
- hvplot: 0.10.0
- identify: 2.5.36
- idna: 3.7
- imagecodecs: 2024.1.1
- imageio: 2.34.1
- importlib-metadata: 7.1.0
- iniconfig: 2.0.0
- inquirer: 3.2.4
- ioda: 0.19.1
- ioda-readout-service-client: 0.0.5
- ioda-result-service-client: 0.6.1
- iopath: 0.1.10
- ipp: 2021.4.0
- ipykernel: 6.29.4
- ipython: 8.24.0
- ipytree: 0.2.2
- ipywidgets: 8.1.3
- isoduration: 20.11.0
- itables: 2.1.0
- itsdangerous: 2.2.0
- jax: 0.4.30
- jaxlib: 0.4.30
- jedi: 0.19.1
- jinja2: 3.1.4
- jmespath: 1.0.1
- joblib: 1.4.2
- json5: 0.9.25
- jsonargparse: 4.29.0
- jsonpointer: 2.4
- jsonschema: 4.22.0
- jsonschema-specifications: 2023.12.1
- jupyter: 1.0.0
- jupyter-bokeh: 4.0.5
- jupyter-client: 8.6.2
- jupyter-console: 6.6.3
- jupyter-core: 5.7.2
- jupyter-dash: 0.4.2
- jupyter-events: 0.10.0
- jupyter-lsp: 2.2.5
- jupyter-server: 2.14.1
- jupyter-server-terminals: 0.5.3
- jupyterlab: 4.2.1
- jupyterlab-pygments: 0.3.0
- jupyterlab-server: 2.27.2
- jupyterlab-vim: 4.1.3
- jupyterlab-widgets: 3.0.11
- kaleido: 0.2.1
- keras: 2.12.0
- kiwisolver: 1.4.5
- lazy-loader: 0.4
- leb128: 1.0.7
- libclang: 18.1.1
- libucx-cu11: 1.15.0.post1
- lightning: 2.4.0
- lightning-utilities: 0.11.4
- linkify-it-py: 2.0.3
- lit: 15.0.7
- litdata: 0.2.16
- llvmlite: 0.42.0
- locket: 1.0.0
- loguru: 0.7.2
- lxml: 4.9.4
- markdown: 3.6
- markdown-it-py: 3.0.0
- markupsafe: 2.1.5
- matplotlib: 3.9.0
- matplotlib-inline: 0.1.7
- mdit-py-plugins: 0.4.1
- mdurl: 0.1.2
- mistune: 3.0.2
- ml-dtypes: 0.4.0
- mpmath: 1.3.0
- msgpack: 1.0.8
- multidict: 6.0.5
- multipledispatch: 1.0.0
- namex: 0.0.8
- nbclient: 0.10.0
- nbconvert: 7.16.4
- nbformat: 5.10.4
- nest-asyncio: 1.6.0
- networkx: 3.3
- newrelic: 8.11.0
- nodeenv: 1.9.0
- notebook: 7.2.0
- notebook-cgebbe: 0.1.0
- notebook-shim: 0.2.4
- nrai-wrapper: 2.19.339839
- numba: 0.59.1
- numcodecs: 0.12.1
- numpy: 1.26.4
- nvidia-cublas-cu12: 12.1.3.1
- nvidia-cuda-cupti-cu12: 12.1.105
- nvidia-cuda-nvrtc-cu12: 12.1.105
- nvidia-cuda-runtime-cu12: 12.1.105
- nvidia-cudnn-cu12: 8.9.2.26
- nvidia-cufft-cu12: 11.0.2.54
- nvidia-curand-cu12: 10.3.2.106
- nvidia-cusolver-cu12: 11.4.5.107
- nvidia-cusparse-cu12: 12.1.0.106
- nvidia-dali-cuda120: 1.39.0
- nvidia-nccl-cu12: 2.20.5
- nvidia-nvimgcodec-cu12: 0.2.0.7
- nvidia-nvjitlink-cu12: 12.5.82
- nvidia-nvtx-cu12: 12.1.105
- nvtx: 0.2.10
- oauthlib: 3.2.2
- ome-types: 0.5.1.post1
- omegaconf: 2.3.0
- opencv-python-headless: 4.9.0.80
- openslide-python: 1.3.1
- opt-einsum: 3.3.0
- optree: 0.12.1
- outcome: 1.3.0.post0
- overrides: 7.7.0
- packaging: 24.0
- pandas: 2.1.4
- pandocfilters: 1.5.1
- panel: 1.4.4
- papermill: 2.6.0
- param: 2.1.0
- parso: 0.8.4
- partd: 1.4.2
- pdbpp: 0.10.3
- pexpect: 4.9.0
- pillow: 10.3.0
- pip: 24.1.2
- pivottablejs: 0.9.0
- platformdirs: 4.2.2
- plotly: 5.22.0
- pluggy: 1.5.0
- polars: 0.20.30
- portal-test-utils: 8.100.0
- portalocker: 2.10.1
- pre-commit: 3.7.1
- prometheus-client: 0.20.0
- prompt-toolkit: 3.0.45
- protobuf: 3.20.0
- psutil: 5.9.8
- psycopg2-binary: 2.9.9
- ptpython: 3.0.27
- ptxcompiler-cu11: 0.8.1.post1
- ptyprocess: 0.7.0
- pudb: 2024.1
- pure-eval: 0.2.2
- pyarrow: 16.1.0
- pyasn1: 0.6.0
- pyasn1-modules: 0.4.0
- pycparser: 2.22
- pyct: 0.5.0
- pydantic: 2.7.3
- pydantic-compat: 0.1.2
- pydantic-core: 2.18.4
- pydeck: 0.9.1
- pygments: 2.18.0
- pylibraft-cu11: 24.6.0
- pymongo: 4.8.0
- pynvml: 11.4.1
- pyogrio: 0.9.0
- pyotp: 2.9.0
- pyparsing: 3.1.2
- pypeln: 0.4.9
- pyportal: 2.33.2
- pyproj: 3.6.1
- pyrepl: 0.9.0
- pysocks: 1.7.1
- pytest: 8.2.2
- python-dateutil: 2.9.0.post0
- python-dotenv: 1.0.1
- python-gitlab: 3.15.0
- python-json-logger: 2.0.7
- python-magic: 0.4.27
- python-on-whales: 0.71.0
- pytorch-lightning: 2.3.3
- pytz: 2024.1
- pyviz-comms: 3.0.2
- pyxdg: 0.28
- pyyaml: 6.0.1
- pyzmq: 26.0.3
- qtconsole: 5.5.2
- qtpy: 2.4.1
- raft-dask-cu11: 24.6.0
- rai: 2.4.1
- rapids-dask-dependency: 24.6.0
- rasterio: 1.3.10
- ray: 2.31.0
- rdi: 2.19.339839
- readchar: 4.1.0
- referencing: 0.35.1
- requests: 2.32.2
- requests-oauthlib: 2.0.0
- requests-toolbelt: 1.0.0
- retrying: 1.3.4
- rfc3339-validator: 0.1.4
- rfc3986-validator: 0.1.1
- rich: 13.7.1
- rmm-cu11: 24.6.0
- rpds-py: 0.18.1
- rsa: 4.9
- rtree: 1.2.0
- ruamel.yaml: 0.18.6
- ruamel.yaml.clib: 0.2.8
- ruff: 0.5.1
- runs: 1.2.2
- s3cmd: 2.4.0
- s3fs: 2024.5.0
- s3transfer: 0.10.1
- safetensors: 0.4.3
- scikit-image: 0.23.2
- scikit-learn: 1.5.1
- scipy: 1.13.1
- seaborn: 0.13.2
- segment-anything: 1.0
- selenium: 4.22.0
- semantic-segmentation: 0.1.0
- send2trash: 1.8.3
- sentry-sdk: 2.6.0
- setproctitle: 1.3.3
- setuptools: 63.4.3
- shapely: 2.0.4
- shellingham: 1.5.4
- six: 1.16.0
- smmap: 5.0.1
- sniffio: 1.3.1
- snuggs: 1.4.7
- sortedcontainers: 2.4.0
- soupsieve: 2.5
- spatialpandas: 0.4.10
- sqlalchemy: 2.0.31
- stack-data: 0.6.3
- stopit: 1.1.2
- streamlit: 1.36.0
- submitit: 1.5.1
- sympy: 1.12
- tabulate: 0.9.0
- tblib: 3.0.0
- tenacity: 8.3.0
- tensorboard: 2.12.3
- tensorboard-data-server: 0.7.2
- tensorflow: 2.12.1
- tensorflow-estimator: 2.12.0
- tensorflow-io: 0.37.1
- tensorflow-io-gcs-filesystem: 0.37.1
- termcolor: 2.4.0
- terminado: 0.18.1
- threadpoolctl: 3.5.0
- tifffile: 2024.5.22
- timm: 1.0.7
- tinycss2: 1.3.0
- toml: 0.10.2
- tomli: 2.0.1
- toolz: 0.12.1
- torch: 2.3.1
- torchdata: 0.7.1
- torchmetrics: 1.0.3
- torchsummary: 1.5.1
- torchvision: 0.18.1
- tornado: 6.4
- tqdm: 4.66.4
- traitlets: 5.14.3
- treelite: 4.1.2
- trio: 0.25.1
- trio-websocket: 0.11.1
- triton: 2.3.1
- typer: 0.12.3
- types-python-dateutil: 2.9.0.20240316
- typing-extensions: 4.12.2
- typing-utils: 0.1.0
- tzdata: 2024.1
- uc-micro-py: 1.0.3
- ucx-py-cu11: 0.38.0
- ucxx-cu11: 0.38.0
- uri-template: 1.3.0
- urllib3: 2.0.7
- urwid: 2.6.12
- urwid-readline: 0.14
- ventanamripy: 0.3
- virtualenv: 20.26.2
- wandb: 0.17.2
- watchdog: 4.0.1
- wcwidth: 0.2.13
- webcolors: 1.13
- webdataset: 0.2.93
- webencodings: 0.5.1
- websocket-client: 1.8.0
- werkzeug: 3.0.3
- wheel: 0.43.0
- widgetsnbextension: 4.0.11
- wmctrl: 0.5
- wrapt: 1.14.1
- wsproto: 1.2.0
- xarray: 2024.6.0
- xformers: 0.0.27
- xmod: 1.8.1
- xsdata: 24.3.1
- xyzservices: 2024.6.0
- yacs: 0.1.8
- yarl: 1.9.4
- zarr: 2.18.2
- zict: 3.0.0
- zipp: 3.19.1
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.11.9
- release: 5.15.0-1066-aws
- version: Trainer.fit() crashes if no checkpoint callback is provided #72~20.04.1-Ubuntu SMP Thu Jul 18 10:41:27 UTC 2024
More info
Relevant existing issues
self.save_hyperparameters()
behavior abnormal #19977The text was updated successfully, but these errors were encountered: