Skip to content

Numba transpose can be very slow due to ascontiguousarray in dispatch #1111

Open
@ricardoV94

Description

@ricardoV94

Description

import numpy as np
import pytensor
import pytensor.tensor as pt

x = pt.matrix("x")
y = x.T

# Avoid deepcopy
x_inp = pytensor.In(x, borrow=True)
y_out = pytensor.Out(y, borrow=True)
c_fn = pytensor.function([x_inp], y_out, mode="FAST_RUN")
numba_fn = pytensor.function([x_inp], y_out, mode="NUMBA")

x_test = np.random.normal(size=(9000, 9000)).T
np.testing.assert_allclose(c_fn(x_test), numba_fn(x_test))
%timeit c_fn(x_test)
%timeit numba_fn(x_test)
# 7.26 μs ± 39.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# 12.8 μs ± 38.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

x_test = x_test[::2, ::2]
np.testing.assert_allclose(c_fn(x_test), numba_fn(x_test))
%timeit c_fn(x_test)
%timeit numba_fn(x_test)
# 7.28 μs ± 190 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# 108 ms ± 4.87 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

This happens because we (sometimes) introduce a copy here:

# FIXME: Numba's `array.reshape` only accepts C arrays.
res_reshape = np.reshape(np.ascontiguousarray(x), new_shape)

We could add a specialized dispatch for pure transpose DimShuffle (without agument or drop dims). Or we could avoid Reshape altogether similar to #847. Although Numba doesn't yet have an overload for np.squeeze and np.expand_dims only accepts one axis at a time. Looking at the implementation it doesn't seem like it would be too hard to extend?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions