Skip to content

softplus, mish, logsumexp, logcumsumexp overflow in fp16 on ANE — missing stable decompositions #21

Description

@Ashutosh0x

Problem

coreai-torch relies on PyTorch's
un_decompositions() to break down operations like softplus, mish, logsumexp, and logcumsumexp into primitive ops before conversion. However, PyTorch's default decompositions produce naïve forms that overflow in fp16 on Apple Neural Engine:

Operation Naïve Decomposition Failure Threshold Failure Mode
softplus log(1 + exp(x)) x ≈ 10.4 Output → 0
mish x * tanh(log(1 + exp(x))) x ≈ 10.4 Output → 0
logsumexp log(sum(exp(x_i))) x ≈ 7.63 Output → 0
logcumsumexp log(cumsum(exp(x_i))) x ≈ 11.09 Output → ∞/NaN

These operations are not in the _COMPOSITE_OPS preserve list in _decomp.py, so they get decomposed into overflow-prone primitives.

Note: log_softmax is already correctly handled with a stable max-shift implementation (replace_log_softmax in _aten_to_core.py).

Root Cause

In _decomp.py, the decomposition table preserves only 6 ops (hardsigmoid, hardswish, instance_norm, pixel_shuffle, scaled_dot_product_attention, silu). When softplus is not in this list, PyTorch decomposes it to log(1 + exp(x)), where exp(x) overflows fp16 (max 65,504) for x > ~11.09. On the ANE specifically, the overflow occurs even earlier at x ≈ 10.4 due to an internal 2^15-bounded representation.

Proposed Fix

Apply algebraically equivalent, numerically stable decompositions at the converter layer:

Softplus:
python softplus(x) = max(x, 0) + log(1 + exp(-|x|))
Since -|x| <= 0, exp(-|x|) ∈ (0, 1] — no overflow possible.

Mish:
python mish(x) = x * tanh(softplus_stable(x))

Logsumexp (max-shift):
python logsumexp(x) = max(x) + log(sum(exp(x - max(x))))

Prior Art

These exact fixes have been implemented and validated in apple/coremltools (the predecessor framework):

  • PR #2725 — softplus, mish
  • PR #2726 — logsumexp
  • PR #2727 — log_softmax, logcumsumexp
  • Issue #2687 — original fp16 overflow report

Validated across M3 Max and M5 silicon, 128+ test configurations, zero regressions.

Environment

  • coreai-torch: latest (cloned June 21, 2026)
  • macOS 26 / Apple Silicon
  • PyTorch 2.7+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions