Open
Description
Description
XLA does this, and it seems faster for reasonably sized inputs
import numpy as np
x = np.abs(np.random.normal(size=10_000))
np.testing.assert_allclose(np.log(np.sqrt(x)), np.multiply(0.5, np.log(x)))
%timeit np.log(np.sqrt(x))
%timeit np.multiply(0.5, np.log(x))
# 16.7 μs ± 2.24 μs per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# 8.36 μs ± 8.94 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
This can be extended to general power, although for cases that may handle initially negative x, we need an abs, and a naive numpy bench doesn't suggest a benefit:
x = np.abs(np.random.normal(size=10_000))
np.testing.assert_allclose(np.log(np.square(x)), np.multiply(2, np.log(np.abs(x))))
%timeit np.log(np.square(x))
%timeit np.multiply(2, np.log(np.abs(x)))
# 8.15 μs ± 19 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# 9.77 μs ± 12.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
The story may be different with a fused kernel as PyTensor would do, and/or without allocating the intermediate arrays, which can be done by passing out=...
to the numpy functions.