softplus, mish, logsumexp, logcumsumexp overflow in fp16 on ANE — missing stable decompositions

## Problem

coreai-torch relies on PyTorch's un_decompositions() to break down operations like softplus, mish, logsumexp, and logcumsumexp into primitive ops before conversion. However, PyTorch's default decompositions produce **naïve forms** that overflow in fp16 on Apple Neural Engine:

| Operation | Naïve Decomposition | Failure Threshold | Failure Mode |
|-----------|-------------------|-------------------|--------------|
| `softplus` | `log(1 + exp(x))` | `x ≈ 10.4` | Output → 0 |
| `mish` | `x * tanh(log(1 + exp(x)))` | `x ≈ 10.4` | Output → 0 |
| `logsumexp` | `log(sum(exp(x_i)))` | `x ≈ 7.63` | Output → 0 |
| `logcumsumexp` | `log(cumsum(exp(x_i)))` | `x ≈ 11.09` | Output → ∞/NaN |

These operations are **not** in the `_COMPOSITE_OPS` preserve list in `_decomp.py`, so they get decomposed into overflow-prone primitives.

**Note:** `log_softmax` is already correctly handled with a stable max-shift implementation (`replace_log_softmax` in `_aten_to_core.py`).

## Root Cause

In `_decomp.py`, the decomposition table preserves only 6 ops (`hardsigmoid`, `hardswish`, `instance_norm`, `pixel_shuffle`, `scaled_dot_product_attention`, `silu`). When `softplus` is not in this list, PyTorch decomposes it to `log(1 + exp(x))`, where `exp(x)` overflows fp16 (max 65,504) for `x > ~11.09`. On the ANE specifically, the overflow occurs even earlier at `x ≈ 10.4` due to an internal 2^15-bounded representation.

## Proposed Fix

Apply algebraically equivalent, numerically stable decompositions at the converter layer:

**Softplus:**
`python
softplus(x) = max(x, 0) + log(1 + exp(-|x|))
`
Since `-|x| <= 0`, `exp(-|x|) ∈ (0, 1]` — no overflow possible.

**Mish:**
`python
mish(x) = x * tanh(softplus_stable(x))
`

**Logsumexp (max-shift):**
`python
logsumexp(x) = max(x) + log(sum(exp(x - max(x))))
`

## Prior Art

These exact fixes have been implemented and validated in `apple/coremltools` (the predecessor framework):
- PR [#2725](https://github.com/apple/coremltools/pull/2725) — softplus, mish
- PR [#2726](https://github.com/apple/coremltools/pull/2726) — logsumexp
- PR [#2727](https://github.com/apple/coremltools/pull/2727) — log_softmax, logcumsumexp
- Issue [#2687](https://github.com/apple/coremltools/issues/2687) — original fp16 overflow report

Validated across M3 Max and M5 silicon, 128+ test configurations, zero regressions.

## Environment

- coreai-torch: latest (cloned June 21, 2026)
- macOS 26 / Apple Silicon
- PyTorch 2.7+

Operation	Naïve Decomposition	Failure Threshold	Failure Mode
`softplus`	`log(1 + exp(x))`	`x ≈ 10.4`	Output → 0
`mish`	`x * tanh(log(1 + exp(x)))`	`x ≈ 10.4`	Output → 0
`logsumexp`	`log(sum(exp(x_i)))`	`x ≈ 7.63`	Output → 0
`logcumsumexp`	`log(cumsum(exp(x_i)))`	`x ≈ 11.09`	Output → ∞/NaN

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

softplus, mish, logsumexp, logcumsumexp overflow in fp16 on ANE — missing stable decompositions #21

Problem

Root Cause

Proposed Fix

Prior Art

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

softplus, mish, logsumexp, logcumsumexp overflow in fp16 on ANE — missing stable decompositions #21

Description

Problem

Root Cause

Proposed Fix

Prior Art

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions