Small optimization for adam (pytorch#12107)

jma127 · facebook-github-bot · commit 383d340e8896 · 2018-09-26T21:43:46.000-07:00
Summary: Apply weight decay for Adam in-place instead of via copy. Synced offline with soumith , who mentioned that it should be OK. This is also consistent with other optimizers, e.g. https://github.com/pytorch/pytorch/blob/eee01731a5d33d5be58d875711bd2577e38dbddf/torch/optim/sgd.py#L93 Pull Request resolved: pytorch#12107 Reviewed By: soumith Differential Revision: D10071787 Pulled By: jma127 fbshipit-source-id: 5fd7939c79039693b225c44c4c80450923b8d673
diff --git a/torch/optim/adam.py b/torch/optim/adam.py
@@ -87,7 +87,7 @@ def step(self, closure=None):
                 state['step'] += 1
 
                 if group['weight_decay'] != 0:
-                    grad = grad.add(group['weight_decay'], p.data)
+                    grad.add_(group['weight_decay'], p.data)
 
                 # Decay the first and second moment running average coefficient
                 exp_avg.mul_(beta1).add_(1 - beta1, grad)