[autograd.Function] setup_context always appears on the Function (pytorch#92312)

zou3519 · pytorchmergebot · commit 98b78aa11c5a · 2023-01-18T02:55:42.000Z
Previously, we used the existence of setup_context to switch between if forward should take a ctx object or not. To be consistent with all other staticmethod (which always exist on the autograd.Function), this PR change it so that we use IF setup_context gets overriden by the user to switch between if forward should take a ctx object or not. Fixes pytorch#91451 Test Plan: - existing tests Pull Request resolved: pytorch#92312 Approved by: https://github.com/albanD, https://github.com/soulitzer
diff --git a/docs/source/notes/extending.func.rst b/docs/source/notes/extending.func.rst
@@ -25,7 +25,7 @@ This guide assumes you are familiar with :ref:`extending-autograd`,
 which explains how to use :class:`torch.autograd.Function`.
 
 :class:`torch.autograd.Function` can either have a :meth:`~Function.forward` that accepts a ctx object,
-or it can have separate :meth:`~Function.forward` (that does not accept ``ctx``) and a ``setup_context``
+or it can have separate :meth:`~Function.forward` (that does not accept ``ctx``) and a :meth:`~Function.setup_context`
 staticmethod that modifies the ``ctx`` object.
 
 Only the latter is supported with function transforms:
@@ -52,7 +52,7 @@ Depending on the transform,
 
 In order for the :class:`torch.autograd.Function` to be arbitrarily composable with function
 transforms, we recommend that all other staticmethods other than :meth:`~Function.forward` and
-``setup_context`` must be transformable: that is, they must consist of only PyTorch
+:meth:`~Function.setup_context` must be transformable: that is, they must consist of only PyTorch
 operators or call other :class:`torch.autograd.Function` (that may call into C++/CUDA/etc).
 
 Let's go over some examples of common use cases.
diff --git a/docs/source/notes/extending.rst b/docs/source/notes/extending.rst
@@ -52,7 +52,7 @@ How to use
 ^^^^^^^^^^
 Take the following steps:
 1. Subclass :class:`~Function` and implement the :meth:`~Function.forward`,
-(optional) ``setup_context`` and
+(optional) :meth:`~Function.setup_context` and
 :meth:`~Function.backward` methods.
 2. Call the proper methods on the `ctx` argument.
 3. Declare whether your function supports
@@ -73,12 +73,12 @@ Take the following steps:
   tensors if there are multiple outputs. Also, please refer to the
   docs of :class:`Function` to find descriptions of useful methods that can be
   called only from :meth:`~Function.forward`.
-- ``setup_context`` (optional). One can either write a "combined" :meth:`~Function.forward` that
+- :meth:`~Function.setup_context` (optional). One can either write a "combined" :meth:`~Function.forward` that
   accepts a ``ctx`` object or (as of PyTorch 2.0) a separate :meth:`~Function.forward` that does
-  not accept ``ctx`` and a ``setup_context`` method where the ``ctx`` modification happens.
-  The :meth:`~Function.forward` should have the compute and ``setup_context`` should
+  not accept ``ctx`` and a :meth:`~Function.setup_context` method where the ``ctx`` modification happens.
+  The :meth:`~Function.forward` should have the compute and :meth:`~Function.setup_context` should
   only be responsible for the ``ctx`` modification (and not have any compute).
-  In general the separate :meth:`~Function.forward` and ``setup_context`` is closer to how
+  In general the separate :meth:`~Function.forward` and :meth:`~Function.setup_context` is closer to how
   PyTorch native operations work and therefore more composable with various PyTorch subsystems.
   See :ref:`combining-forward-context` for more details.
 - :meth:`~Function.backward` (or :meth:`~Function.vjp`) defines the gradient formula.
@@ -234,7 +234,7 @@ And here, we optimize the above example by calling set_materialize_grads(False):
             return grad_output * ctx.constant, None
 
 If you need any "intermediate" Tensors computed in :meth:`~Function.forward` to be saved,
-either they must be returned as outputs, or combine ``forward`` and ``setup_context``
+either they must be returned as outputs, or combine ``forward`` and :meth:`~Function.setup_context`
 (see :ref:`combining-forward-context`).
 Note that this means if you want gradients to flow through those intermediate values, you
 need to define the gradient formula for them (see also
@@ -300,25 +300,25 @@ can use the ``gradgradcheck`` function from the same package to check higher ord
 
 .. _combining-forward-context:
 
-Combined or separate :meth:`~Function.forward` and ``setup_context``
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Combined or separate :meth:`~Function.forward` and :meth:`~Function.setup_context`
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 There are two main ways to define :class:`~Function`. Either:
 
-- define a :meth:`~Function.forward` that combines the forward compute logic with ``setup_context``
-- (as of PyTorch 2.0) define a separate :meth:`~Function.forward` and ``setup_context``.
+- define a :meth:`~Function.forward` that combines the forward compute logic with :meth:`~Function.setup_context`
+- (as of PyTorch 2.0) define a separate :meth:`~Function.forward` and :meth:`~Function.setup_context`
 
-We recommend the second option (separate :meth:`~Function.forward` and ``setup_context``)
+We recommend the second option (separate :meth:`~Function.forward` and :meth:`~Function.setup_context`)
 because that is closer to how PyTorch native operations are implemented and it composes
 with :mod:`torch.func` transforms. However, we plan to support both approaches going forward;
-combining :meth:`~Function.forward` with ``setup_context``: leads to more flexibility since
+combining :meth:`~Function.forward` with :meth:`~Function.setup_context`: leads to more flexibility since
 you are able to save intermediates without returning them as output.
 
 Please see the previous section for how to define :class:`~Function` with separate
-:meth:`~Function.forward` and ``setup_context``.
+:meth:`~Function.forward` and :meth:`~Function.setup_context`.
 
 Here is an example of how to define a :class:`Function` with combined :meth:`~Function.forward` and
-``setup_context``::
+:meth:`~Function.setup_context`::
 
     class LinearFunction(Function):
         @staticmethod
diff --git a/test/functorch/test_eager_transforms.py b/test/functorch/test_eager_transforms.py
@@ -3037,7 +3037,7 @@ def backward(ctx, gy):
 
         x = torch.randn(3, device=device)
         transform = getattr(functorch, transform)
-        with self.assertRaisesRegex(RuntimeError, 'must have a setup_context'):
+        with self.assertRaisesRegex(RuntimeError, 'must override the setup_context'):
             transform(MySin.apply)(x)
 
     @parametrize('transform', [
diff --git a/torch/autograd/function.py b/torch/autograd/function.py
@@ -7,7 +7,7 @@
 import functools
 import warnings
 from collections import OrderedDict
-from typing import Any, List, Optional
+from typing import Any, List, Optional, Tuple
 from torch._functorch.autograd_function import custom_function_call
 
 __all__ = ["FunctionCtx", "BackwardCFunction", "FunctionMeta", "Function", "once_differentiable", "traceable",
@@ -323,8 +323,8 @@ def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None:
                 pass
 
         - The forward no longer accepts a ctx argument.
-        - Instead, you must also define a setup_context staticmethod to handle setting up the
-          ``ctx`` object.
+        - Instead, you must also override the :meth:`torch.autograd.Function.setup_context`
+          staticmethod to handle setting up the ``ctx`` object.
           ``output`` is the output of the forward, ``inputs`` are a Tuple of inputs
           to the forward.
         - See :ref:`extending-autograd` for more details
@@ -340,6 +340,23 @@ def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None:
         raise NotImplementedError("You must implement the forward function for custom"
                                   " autograd.Function.")
 
+    @staticmethod
+    def setup_context(ctx: Any, inputs: Tuple[Any], output: Any) -> Any:
+        r"""There are two ways to define the forward pass of an autograd.Function.
+
+        Either:
+
+        1. Override forward with the signature forward(ctx, *args, **kwargs).
+           ``setup_context`` is not overridden. Setting up the ctx for backward
+           happens inside the ``forward``.
+        2. Override forward with the signature forward(*args, **kwargs) and
+           override ``setup_context``. Setting up the ctx for backward happens
+           inside ``setup_context`` (as opposed to inside the ``forward``)
+
+        See :meth:`torch.autograd.Function.forward` and :ref:`extending-autograd` for more details.
+        """
+        raise NotImplementedError("setup_context is not implemented.")
+
     @staticmethod
     def backward(ctx: Any, *grad_outputs: Any) -> Any:
         r"""Defines a formula for differentiating the operation with backward mode
@@ -490,13 +507,12 @@ def apply(cls, *args, **kwargs):
             args = _functorch.utils.unwrap_dead_wrappers(args)
             return super().apply(*args, **kwargs)
 
-        if not hasattr(cls, 'setup_context'):
-            # TODO: link documentation in error message
-            # https://github.com/pytorch/pytorch/issues/90224
+        if cls.setup_context == _SingleLevelFunction.setup_context:
             raise RuntimeError(
-                'In order to use an autograd.Function with functorch transforms ',
-                '(vmap, grad, jvp, jacrev, ...), it must have a setup_context ',
-                'staticmethod.')
+                'In order to use an autograd.Function with functorch transforms '
+                '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
+                'staticmethod. For more details, please see '
+                'https://pytorch.org/docs/master/notes/extending.func.html')
 
         return custom_function_call(cls, *args, **kwargs)
 
diff --git a/torch/csrc/autograd/python_function.cpp b/torch/csrc/autograd/python_function.cpp
@@ -871,6 +871,31 @@ THPObjectPtr make_ctx_input_output_tuple(
 
 } // namespace
 
+static PyObject* THPFunction_setup_context = nullptr;
+
+static PyObject* get_base_setup_context() {
+  if (THPFunction_setup_context != nullptr) {
+    return THPFunction_setup_context;
+  }
+
+  auto module = THPObjectPtr(PyImport_ImportModule("torch.autograd.function"));
+  if (!module)
+    return nullptr;
+
+  auto function =
+      THPObjectPtr(PyObject_GetAttrString(module, "_SingleLevelFunction"));
+  if (!function)
+    return nullptr;
+
+  // setup_context gets "leaked" - we return a new reference and hold onto it
+  // forever.
+  auto setup_context = PyObject_GetAttrString(function, "setup_context");
+  if (!setup_context)
+    return nullptr;
+  THPFunction_setup_context = setup_context;
+  return THPFunction_setup_context;
+}
+
 PyObject* THPFunction_apply(PyObject* cls, PyObject* inputs) {
   HANDLE_TH_ERRORS
 
@@ -920,10 +945,19 @@ PyObject* THPFunction_apply(PyObject* cls, PyObject* inputs) {
   ctx->needs_input_grad = input_info.needs_input_grad.release();
   ctx->is_variable_input = std::move(input_info.is_variable_input);
 
-  // autograd.Function may optionally contain a setup_context staticmethod.
+  // autograd.Function may optionally override a setup_context staticmethod.
   // In this case, autograd.Function.forward does NOT accept a ctx object.
-  bool has_separate_setup_context_fn =
-      PyObject_HasAttrString(cls, "setup_context");
+  // Determine if this is the case.
+  auto cls_setup_context =
+      THPObjectPtr(PyObject_GetAttrString(cls, "setup_context"));
+  if (!cls_setup_context) {
+    return nullptr;
+  }
+  auto orig_setup_context = get_base_setup_context();
+  if (!orig_setup_context) {
+    return nullptr;
+  }
+  auto overridden_setup_context = cls_setup_context.get() != orig_setup_context;
 
   auto num_args = PyTuple_GET_SIZE(inputs);
 
@@ -935,7 +969,7 @@ PyObject* THPFunction_apply(PyObject* cls, PyObject* inputs) {
     THPObjectPtr forward_fn(PyObject_GetAttrString(cls, "forward"));
     if (!forward_fn)
       return nullptr;
-    if (has_separate_setup_context_fn) {
+    if (overridden_setup_context) {
       // call forward followed by setup_context
       output = PyObject_CallObject(forward_fn, unpacked_input.input_tuple);
       if (!output) {