Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in catalyst/callbacks/backward.py if the grad_clip_fn value is set. #1444

Closed
9 of 10 tasks
AleksandrMinin opened this issue Apr 5, 2023 · 2 comments
Closed
9 of 10 tasks
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed wontfix This will not be worked on

Comments

@AleksandrMinin
Copy link

🐛 Bug Report

Bug in catalyst/callbacks/backward.py if the grad_clip_fn value is set.

How To Reproduce

Steps to reproduce the behavior:

  1. Create a callback with a BackwardCallback in which grad_clip_fn is not empty.
  2. Launch runner.train with this callback.
  3. The output will be an error:
/python_envs/kaggle-env/lib/python3.8/site-packages/catalyst/callbacks/backward.py:55                                                                                                 
   52 │   │   │   
   53 │   │   │   if self.grad_clip_fn is not None:
   54 │   │   │   │   runner.engine.unscale_gradients()
-->55 │   │   │   │   norm = self.grad_clip_fn(self.model.parameters())
   56 │   │   │   │   if self._log_gradient:
   57 │   │   │   │   │   runner.batch_metrics[f"{self._prefix_gradient}/norm"] = norm
   58                                                                                             

AttributeError: 'BackwardCallback' object has no attribute 'model'

Code sample

import torch
from torch.nn.utils import clip_grad_norm_
from catalyst import dl
from catalyst.core.callback import Callback
from catalyst.engines.torch import CPUEngine, GPUEngine

from src.config import config
from src.base_config import Config
from src.tools import set_global_seed, get_code
from src.dataset import get_loaders
from src.crnn import CRNN
from src.runners import SupervisedOCRRunner

callbacks= [     
    dl.CriterionCallback(
        input_key=dict(output="log_probs", output_size="input_lengths"),
        target_key=dict(target="targets", target_len="target_lengths"),     
        metric_key="loss",
        criterion_key="ctc_loss_fn",
    ),
    dl.BackwardCallback(
        metric_key="loss",
        grad_clip_fn=clip_grad_norm_,
        grad_clip_params={"max_norm": 0.5,
                          "norm_type": 2},   
    ),
]


loaders, infer_loader = get_loaders(config)  
model = CRNN(**config.model_kwargs)

optimizer = config.optimizer(params=model.parameters(), **config.optimizer_kwargs)
scheduler = config.scheduler(optimizer=optimizer, **config.scheduler_kwargs)


if torch.cuda.is_available():
    engine = GPUEngine()
else:
    engine = CPUEngine()

runner = SupervisedOCRRunner(
    input_key="image", 
    target_key="target", 
    output_key="output",
)

criterion = {"ctc_loss_fn": config.ctc_loss}

runner.train(
    model=model,
    engine=engine,
    criterion=criterion,
    optimizer=optimizer,
    scheduler=scheduler,
    loaders=loaders,
    callbacks=callbacks,
    num_epochs=config.n_epochs,
    valid_loader="valid",
    valid_metric=config.valid_metric,
    minimize_valid_metric=config.minimize_metric,
    seed=config.seed,
    verbose=True,
    load_best_on_end=True,
)

Expected behavior

You need to replace

norm = self.grad_clip_fn(self.model.parameters()) 

with

norm = self.grad_clip_fn(runner.model.parameters())

in catalyst/callbacks/backward.py line 55.

Then there will be no mistake and the training will be successful.

Environment

Catalyst version: 22.04
PyTorch version: 1.13.0+cu117
Is debug build: No
CUDA used to build PyTorch: 11.7
TensorFlow version: N/A
TensorBoard version: 2.9.1

OS: Ubuntu 20.04.3 LTS
GCC version: (Ubuntu 7.5.0-6ubuntu2) 7.5.0
CMake version: version 3.10.3

Python version: 3.8
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA GeForce GTX 1080
GPU 1: NVIDIA GeForce GTX 1080

Nvidia driver version: 470.82.01
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] catalyst==22.4
[pip3] efficientnet-pytorch==0.7.1
[pip3] numpy==1.23.5
[pip3] pytorch-ignite==0.4.11
[pip3] segmentation-models-pytorch==0.3.2
[pip3] tensorboard==2.9.1
[pip3] tensorboard-data-server==0.6.1
[pip3] tensorboard-plugin-wit==1.8.1
[pip3] tensorboardX==2.5.1
[pip3] torch==1.13.0
[pip3] torchvision==0.14.0
[conda] blas                      1.0                         mkl  
[conda] mkl                       2021.4.0           h06a4308_640  
[conda] mkl-service               2.4.0            py39h7f8727e_0  
[conda] mkl_fft                   1.3.1            py39hd3c417c_0  
[conda] mkl_random                1.2.2            py39h51133e4_0  
[conda] numpy                     1.21.5           py39h6c91a56_3  
[conda] numpy-base                1.21.5           py39ha15fc14_3  
[conda] numpydoc                  1.4.0            py39h06a4308_0

Checklist

  • bug description
  • steps to reproduce
  • expected behavior
  • environment
  • code sample / screenshots

FAQ

Please review the FAQ before submitting an issue:

@AleksandrMinin AleksandrMinin added bug Something isn't working help wanted Extra attention is needed labels Apr 5, 2023
@github-actions
Copy link

github-actions bot commented Apr 5, 2023

Hi! Thank you for your contribution! Please re-check all issue template checklists - unfilled issues would be closed automatically. And do not forget to join our slack for collaboration.

@stale
Copy link

stale bot commented Jun 10, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jun 10, 2023
@stale stale bot closed this as completed Oct 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

3 participants