Add support for loss functions with auxiliary data to linesearch #1177

ro0mquy · 2025-01-16T11:47:51Z

Summary

This change adds support for loss functions that return auxiliary data alongside their primary value, like (loss_value, extra_data). This pattern is commonly used with jax.value_and_grad(fn, has_aux=True).

The approach:

Added value_fn_has_aux flag to zoom_linesearch and scale_by_zoom_linesearch
Modified value handling to properly unpack auxiliary data when needed using a new _unpack_value helper that extracts just the loss value
Updated value storage in state to keep the full value+aux tuple when needed
Added has_aux parameter to value_and_grad_from_state to properly handle auxiliary data when reusing cached values

This allows the linesearch algorithms to work with loss functions that return auxiliary data while maintaining the optimization over just the primary loss value.

Input needed: How to initialize `opt_state`?

The linesearch algorithm stores value and grad in the optimizer state to enable reuse of function evaluations. When using auxiliary data, JAX compilation needs to know the structure of this data upfront.

Currently, I'm initializing it like this:

opt_state = optimizer.init(params)
# Run loss function once to get auxiliary data structure
_, aux = loss(params)
# Set value to infinity (to force recalculation) but keep aux structure
value = (jnp.asarray(jnp.inf), aux)
opt_state = optax.tree_utils.tree_set(opt_state, value=value)

This feels a bit hacky since it requires an extra function evaluation just to get the structure. Is there a better way to handle this initialization?

The challenge is that the auxiliary data structure is determined by the loss function and could be arbitrary (e.g., dictionaries, nested structures, etc.).

ToDos

Add support to backtracking linesearch
Add documentation and doc strings
Add tests
Improve handling of initial opt_state

This change adds support for loss functions that return auxiliary data alongside their primary value, like (loss_value, extra_data). This pattern is commonly used with jax.value_and_grad(fn, has_aux=True). The approach: 1. Added value_fn_has_aux flag to zoom_linesearch and scale_by_zoom_linesearch 2. Modified value handling to properly unpack auxiliary data when needed using a new _unpack_value helper that extracts just the loss value 3. Updated value storage in state to keep the full value+aux tuple when needed 4. Added has_aux parameter to value_and_grad_from_state to properly handle auxiliary data when reusing cached values This allows the linesearch algorithms to work with loss functions that return auxiliary data while maintaining the optimization over just the primary loss value.

google-cla · 2025-01-16T11:47:55Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

rdyro · 2025-02-03T20:24:10Z

Hey, thanks for continuing to work on this! Can you sign the CLA please?

theo-brown · 2025-02-19T09:08:56Z

@ro0mquy thanks for this PR, I'm keen to see this feature added!

ro0mquy · 2025-02-19T18:37:50Z

I'm still figuring out the CLA thing with my company. But it's just a matter of time. I also accidentally pushed another fix (the "fix slope calculation in zoom_linesearch") to the same branch. I can revert that again if needed.

ro0mquy · 2025-03-09T11:26:05Z

I signed the CLA and updated my email. Is there a way to retrigger the check?

vroulet · 2025-03-11T23:26:27Z

Hello @ro0mquy,
Any commit will retrigger the tests, so you may merge with main for example.

By curiosity

why did you need the aux value returned ? (use cases would be great to understand)
why not just wrapping the function that returned an aux into a function that does not return an aux? The linesearch only needs the value. At the end of the linesearch, the value and grad computed at the accepted point may get recycled by the value_and_grad_from_state function but it does not have to. You can always just recompute the value, grad (and potentially aux) by yourself. True, that would not be optimal (because you would be computing twice the value, grad) but that would work.
Namely, when wrapping LBFGS into a solver (see the notebook) consider simply defining

def run_opt(init_params, fun_with_aux, opt, max_iter, tol):
  fun_without_aux = lambda *a, **kw: fun_with_aux(*a, **kw)[0]

  def step(carry):
    params, state = carry
    (value, aux), grad = jax.value_and_grad(fun_with_aux, has_aux=True)
    updates, state = opt.update(
        grad, state, params, value=value, grad=grad, value_fn=fun_without_aux
    )
    params = optax.apply_updates(params, updates)
    return params, state, aux

  def continuing_criterion(carry):
    _, state = carry
    iter_num = otu.tree_get(state, 'count')
    grad = otu.tree_get(state, 'grad')
    err = otu.tree_l2_norm(grad)
    return (iter_num == 0) | ((iter_num < max_iter) & (err >= tol))

  (_, init_aux), _ =  jax.value_and_grad(fun_with_aux, has_aux=True)(init_params)
  init_carry = (init_params, opt.init(init_params), init_aux)
  final_params, final_state, final_aux = jax.lax.while_loop(
      continuing_criterion, step, init_carry
  )
  return final_params, final_state, final_aux

But your fix seems pretty good !

I would like the "has_aux" argument to rather be in the update function too if possible (not clear that it's possible). The reason is to keep all the logic pertaining to the actual function considered outside the signature of the method.
One needs to be careful at initialization: the shape of the aux may not be known by the init function of the optimizer as you noticed. Your hack is quite ok though as long as it is documented (ideally we would change the base api so that the init function could accept additional arguments but that's a much deeper revamp).

Thanks for looking into this!

ro0mquy · 2025-03-12T11:21:11Z

Hey thank you for looking at this.

In my use case the aux data contains the individual loss terms that get added together for the final loss value.
Your comment "that would not be optimal" is exactly why I didn't use the wrapper. I would like to reuse the function value and gradient because my loss function calculation is quite expensive.

I don't have strong opinions about the exact api design. Feel free to make suggestions.

ro0mquy mentioned this pull request Jan 16, 2025

Support for loss function with auxiliary data in linesearch #1053

Open

fix slope calculation in zoom_linesearch

71f48de

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for loss functions with auxiliary data to linesearch #1177

Add support for loss functions with auxiliary data to linesearch #1177

ro0mquy commented Jan 16, 2025

google-cla bot commented Jan 16, 2025

rdyro commented Feb 3, 2025

theo-brown commented Feb 19, 2025

ro0mquy commented Feb 19, 2025

ro0mquy commented Mar 9, 2025

vroulet commented Mar 11, 2025

ro0mquy commented Mar 12, 2025

Add support for loss functions with auxiliary data to linesearch #1177

Are you sure you want to change the base?

Add support for loss functions with auxiliary data to linesearch #1177

Conversation

ro0mquy commented Jan 16, 2025

Summary

Input needed: How to initialize opt_state?

ToDos

google-cla bot commented Jan 16, 2025

rdyro commented Feb 3, 2025

theo-brown commented Feb 19, 2025

ro0mquy commented Feb 19, 2025

ro0mquy commented Mar 9, 2025

vroulet commented Mar 11, 2025

ro0mquy commented Mar 12, 2025

Input needed: How to initialize `opt_state`?