-
Notifications
You must be signed in to change notification settings - Fork 40
Description
As a brief reminder of how the maths works, suppose we have the following sequence:
Then we get a block on the tape corresponding to each
where primes indicate adjoint operations and variables. Let's observe a few patterns:
- If
$u$ is an input to$f$ then$f'$ adds to$u'$ . Multiple adjoint operations can contribute to the same adjoint variable. -
$u'$ is an input to$f'$ if and only if the corresponding primal variable$u$ was an output of the primal operation$f$ . Consequently, each adjoint variable is only the input to one adjoint operation.
This means that when we are evaluating the adjoint, and assuming we don't then intend to evaluate the Hessian, we can discard the adjoint values to block outputs as soon as the adjoint block has been evaluated. Because we do need to keep the adjoint values lying around if we plan to evaluate the Hessian, this would need to be controlled by an option e.g. "keep_adjoint_variables=True".
This would happen at the end of block.evaluate_adj and would do something like:
if not keep_adjoint_variables:
for output in outputs:
output.reset_variables("adjoint")
Note that this should happen even if there are no relevant dependencies, so rather than returning early if there are no relevant dependencies, we should just not call the preparation routine:
if relevant_dependencies:
prepared = self.prepare_evaluate_adj(inputs, adj_inputs, relevant_dependencies)