I am not sure if here is the right place to ask this. If I am in a wrong place, please advise me.
I have a Jetson Xavier NX that runs Yocto linux with the branch of oe4t-patches-l4t-r32.5 of linux-tegra-4.9.
And I have applied rt patches.
SPI master runs in the NX and needs to read multiple 2-byte registers one-by-one. To reduce whole transmission time, I hope to use ioctl(fd, SPI_IOC_MESSAGE(len), buf) call.
My problem is there is a big delay after transfer the data. It is about 100us for each 2 bytes data tranmission.

In the above, yellow is SCK and cyan is CS. First two transmissions use SPI_IOC_MESSAGE(1) in ioctl call and 3rd to 5th transmission uses SPI_IOC_MESSAGE(3) where each message is 2 bytes. The delay is too big.
I found this part of the code of spi-tegra114.c is one that takes long time:
if (tspi->polling_mode)
timeleft = tegra_spi_status_poll(tspi);
else
timeleft = wait_for_completion_timeout(
&tspi->xfer_completion,
SPI_DMA_TIMEOUT);
I tried either way and they are the same. Why does this take a long time of more than 50 us?
I measured the time by ktime_get_ns(). This is the sample:

[81056.799779] spi-tegra114 3210000.spi: 1686 41856
[81056.799783] spi-tegra114 3210000.spi: 1687 46176
[81056.799798] spi-tegra114 3210000.spi: 1689 60576
[81056.799853] spi-tegra114 3210000.spi: 1691 115648
I am not sure if here is the right place to ask this. If I am in a wrong place, please advise me.
I have a Jetson Xavier NX that runs Yocto linux with the branch of oe4t-patches-l4t-r32.5 of linux-tegra-4.9.
And I have applied rt patches.
SPI master runs in the NX and needs to read multiple 2-byte registers one-by-one. To reduce whole transmission time, I hope to use

ioctl(fd, SPI_IOC_MESSAGE(len), buf)call.My problem is there is a big delay after transfer the data. It is about 100us for each 2 bytes data tranmission.
In the above, yellow is SCK and cyan is CS. First two transmissions use SPI_IOC_MESSAGE(1) in ioctl call and 3rd to 5th transmission uses SPI_IOC_MESSAGE(3) where each message is 2 bytes. The delay is too big.
I found this part of the code of spi-tegra114.c is one that takes long time:
I tried either way and they are the same. Why does this take a long time of more than 50 us?
I measured the time by

ktime_get_ns(). This is the sample: