Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mmc: sdhci: fix max req size based on spec #326

Merged
merged 1 commit into from
Feb 5, 2025

Conversation

nyanmisaka
Copy link
Collaborator

@nyanmisaka nyanmisaka commented Feb 5, 2025

For almost 2 decades, the max allowed requests were limited to 512KB because of SDMA's max 512KiB boundary limit.

ADMA2 and ADMA3 do not have such limits and were effectively made so any kind of block count would not impose interrupt and managing stress to the host.

By limiting that to 512KiB, it effectively downgrades these DMA modes to SDMA.

Fix that by actually following the spec:
When ADMA is selected tuning mode is advised.
On lesser modes 4MiB transfers is selected as max, so re-tuning if timer trigger or if requested by host interrupt, can be done in time. Otherwise, the only limit is the variable size of types used. In this implementation, 16MiB is used as maximum since tests showed that after that point, there are diminishing returns.

Also 16MiB in worst case scenarios, when card is eMMC and its max speed is a generous 350MiB/s, will generate interrupts every 45ms on huge data transfers.

For example, on local tests with rigorous CPU/GPU burn-in tests and abrupt cut-offs to generate huge temperature changes (upwards/downwards) to the card, tested host was fine up to 128MB/s transfers on slow cards that used SDR104 bus timing without re-tuning.
In that case the 4MiB limit was overridden with a more than safe 8MiB value.

In all testing cases and boards, that change brought the following:

Depending on bus timing and eMMC/SD specs:
* Max Read throughput increased by 2-20%
* Max Write throughput increased by 50-200%
Depending on CPU frequency and transfer sizes:
* Reduced mmcqd cpu core usage by 4-50%

Based on CTCaer/switch-l4t-kernel-4.9@fa86ebb

This has been shown to improve SD card read and write performance on the Nintendo Switch, Raspberry Pi 5B and Starfive Visionfive 2.

It also works on the RK3588 board, which I have been testing for a year and has worked well so far. Joshua-Riek's tree also contains it. The original author CTCaer is no longer active, so I thought it would be a good idea to keep the patch ourselves.

For almost 2 decades, the max allowed requests were limited to 512KB because of
SDMA's max 512KiB boundary limit.

ADMA2 and ADMA3 do not have such limits and were effectively made so any
kind of block count would not impose interrupt and managing stress to the host.

By limiting that to 512KiB, it effectively downgrades these DMA modes to SDMA.

Fix that by actually following the spec:
When ADMA is selected tuning mode is advised.
On lesser modes 4MiB transfers is selected as max, so re-tuning if timer trigger
or if requested by host interrupt, can be done in time.
Otherwise, the only limit is the variable size of types used.
In this implementation, 16MiB is used as maximum since tests showed that after
that point, there are diminishing returns.

Also 16MiB in worst case scenarios, when card is eMMC and its max speed is a
generous 350MiB/s, will generate interrupts every 45ms on huge data transfers.

For example, on local tests with rigorous CPU/GPU burn-in tests and abrupt
cut-offs to generate huge temperature changes (upwards/downwards) to the card,
tested host was fine up to 128MB/s transfers on slow cards that used SDR104
bus timing without re-tuning.
In that case the 4MiB limit was overridden with a more than safe 8MiB value.

In all testing cases and boards, that change brought the following:

Depending on bus timing and eMMC/SD specs:
* Max Read throughput increased by 2-20%
* Max Write throughput increased by 50-200%
Depending on CPU frequency and transfer sizes:
* Reduced mmcqd cpu core usage by 4-50%

Signed-off-by: CTCaer <[email protected]>
@amazingfate amazingfate merged commit faeb1e7 into armbian:rk-6.1-rkr5 Feb 5, 2025
1 check passed
@nyanmisaka nyanmisaka deleted the improve-mmc-speed branch February 5, 2025 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants