Numba transpose can be very slow due to `ascontiguousarray` in dispatch

### Description

```python
import numpy as np
import pytensor
import pytensor.tensor as pt

x = pt.matrix("x")
y = x.T

# Avoid deepcopy
x_inp = pytensor.In(x, borrow=True)
y_out = pytensor.Out(y, borrow=True)
c_fn = pytensor.function([x_inp], y_out, mode="FAST_RUN")
numba_fn = pytensor.function([x_inp], y_out, mode="NUMBA")

x_test = np.random.normal(size=(9000, 9000)).T
np.testing.assert_allclose(c_fn(x_test), numba_fn(x_test))
%timeit c_fn(x_test)
%timeit numba_fn(x_test)
# 7.26 μs ± 39.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# 12.8 μs ± 38.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

x_test = x_test[::2, ::2]
np.testing.assert_allclose(c_fn(x_test), numba_fn(x_test))
%timeit c_fn(x_test)
%timeit numba_fn(x_test)
# 7.28 μs ± 190 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# 108 ms ± 4.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

This happens because we (sometimes) introduce a copy here: https://github.com/pymc-devs/pytensor/blob/19dafe42e2b34037b0f906d61116fa0d1de5025c/pytensor/link/numba/dispatch/elemwise.py#L691-L692

We could add a specialized dispatch for pure transpose DimShuffle (without agument or drop dims). Or we could avoid `Reshape` altogether similar to #847. Although `Numba` doesn't yet have an overload for `np.squeeze` and `np.expand_dims` only accepts one axis at a time. Looking at the implementation it doesn't seem like it would be too hard to extend?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Numba transpose can be very slow due to `ascontiguousarray` in dispatch #1111

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	# FIXME: Numba's `array.reshape` only accepts C arrays.
	res_reshape = np.reshape(np.ascontiguousarray(x), new_shape)

Numba transpose can be very slow due to ascontiguousarray in dispatch #1111

Description

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Numba transpose can be very slow due to `ascontiguousarray` in dispatch #1111