Skip to content

Conversation

@lzsaver
Copy link

@lzsaver lzsaver commented Nov 17, 2025

Motivation and Context

#17563 speeds up system boot by ensuring the 256K benchmark runs during boot, while others run on demand. However, for some reason, the 256K benchmark does not always run during boot. This patch forces the 256K benchmark to be run on demand as well, which should resolve #17945.

Description

One of two things must be true: either the 256K benchmark is not being run when the system boots, or the results of that benchmark are not being taken into account. If you force the 256K benchmark to run on demand, the data is displayed correctly.

It may not quite fit the concept, but it seems to solve the problem. Probably @mcmilk has some more thoughts on the topic.

However, there is an objection to the current concept. If it is true that the 256K benchmark result obtained during boot should be provided on demand, then values should become inconsistent when the governor is changed.

# rmmod zfs spl; cpupower frequency-set -g performance; modprobe zfs && cat /proc/spl/kstat/zfs/chksum_bench
implementation               1k      4k     16k     64k    256k      1m      4m     16m
edonr-generic              1271    1546    1620    1574    1572    1607    1404    1553
skein-generic               537     597     612     607     564     611     595     602
sha256-generic              162     177     179     177     180     180     178     179
sha256-x64                  267     300     306     303     307     306     304     305
sha256-ssse3                326     368     378     383     383     382     381     378
sha512-generic              267     329     339     342     343     339     336     340
sha512-x64                  400     458     474     459     481     480     477     477
blake3-generic              347     379     382     379     377     366     315     357
blake3-sse2                 458    1298    1397    1414    1414    1374    1131    1231
blake3-sse41                459    1463    1590    1631    1612    1598    1544    1602
# rmmod zfs spl; cpupower frequency-set -g powersave; modprobe zfs && cat /proc/spl/kstat/zfs/chksum_bench
implementation               1k      4k     16k     64k    256k      1m      4m     16m
edonr-generic               274     339     363     315     360     354     356     349
skein-generic               119     122     130     115     104     134     135     133
sha256-generic               34      39      36      39      36      37      39      39
sha256-x64                   60      60      66      67      64      64      67      66
sha256-ssse3                 59      80      83      84      83      83      84      79
sha512-generic               59      63      70      73      71      73      55      57
sha512-x64                   57     100      99      99     104      90     102     103
blake3-generic               76      82      82      79      75      76      73      82
blake3-sse2                  93     280     308     311     310     306     308     308
blake3-sse41                106     321     360     372     373     365     366     362

Probably we do not want the 256K benchmark values from the upper table to end up in the lower one.

We could try even more aggressively.

# MAX=$(cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq)
# MIN=$(cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq)
# rmmod zfs spl; cpupower frequency-set -d "${MAX}" -u "${MAX}"; modprobe zfs && cat /proc/spl/kstat/zfs/chksum_bench
# rmmod zfs spl; cpupower frequency-set -d "${MIN}" -u "${MIN}"; modprobe zfs && cat /proc/spl/kstat/zfs/chksum_bench

However, this is an intuitive line of reasoning. It has not been tested yet because of the issue, which is probably more important.

How Has This Been Tested?

The patch was tested on top of three current branches. If the 256K benchmark results were all zeros, they display correctly after applying the patch. However, it should be noted that the scenario where the 256K benchmark still runs during system boot and then runs again on demand was not tested. It could be that this PR does not solve the root cause of the problem. In any case, there is hope that this can be fixed without a major rewrite of the code.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Copy link
Contributor

@mcmilk mcmilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will fix the cosmetic issue.
But we will need to dig a bit deeper into the problem, why the bs256k variable is zero ;-)

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I find it OK. I don't think saving here makes much sense if we may lose consistency. But before we merge this I think it would be good to understand the original reported problem, so that we would not hide it deeper, ending up with sub-optimal implementation selection.

@amotin amotin added the Status: Code Review Needed Ready for review and testing label Nov 18, 2025
Copy link
Contributor

@mcmilk mcmilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may rework the chksum_init() and maybe other functions.
I would try to fix this also.

@adamdmoss
Copy link
Contributor

adamdmoss commented Nov 20, 2025

I think zfs_chksum.c has gotten out of control, having spent too long even grokking it (including the new state machine, yay) let alone trying to find the underlying bug. 😁

So, I think the underlying bug is that the optimization in chksum_benchit() to skip the 256K test assumes that the caller is passing-in the same chksum_stat_t* as before, with the cs->bs256k already populated. But we can see from chksum_benchmark() that the chksum_stat_data data is reallocated and rezero'd on every chksum_benchmark() invokation, i.e. both at-boot and (the first time) on-demand. Whew.

I think the real fix is to wrap this code:

/* count implementations */
        chksum_stat_cnt = 1;  /* edonr */
        chksum_stat_cnt += 1; /* skein */
        chksum_stat_cnt += sha256->getcnt();
        chksum_stat_cnt += sha512->getcnt();
        chksum_stat_cnt += blake3->getcnt();
        chksum_stat_data = kmem_zalloc(
            sizeof (chksum_stat_t) * chksum_stat_cnt, KM_SLEEP);

... in if (chksum_stat_limit == AT_STARTUP) so the stat data is only allocated and cleared exactly once on first-run.

... but there's a lot of complication and smell here for something that optimizes a function that will normally only get called once ever. IMVHO. (If it were up to me I'd consider reverting the optimization.)

@lzsaver
Copy link
Author

lzsaver commented Nov 21, 2025

Let us first get the code into a working state, worthy of the 2.4 release. No surprises. No experiments.

@mcmilk, after that, you can redo it according to some new concept.

@adamdmoss, thanks. I will take a look.

@lzsaver lzsaver marked this pull request as draft November 21, 2025 21:00
@github-actions github-actions bot added Status: Work in Progress Not yet ready for general review and removed Status: Code Review Needed Ready for review and testing labels Nov 21, 2025
@lzsaver lzsaver force-pushed the patch-2 branch 3 times, most recently from ca2011b to 950eb73 Compare November 22, 2025 01:50
@lzsaver
Copy link
Author

lzsaver commented Nov 22, 2025

Well, now it works.

# MIN=$(cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_min_freq)
# cpupower frequency-set -d "${MIN}" -u "${MIN}"
# sleep 1m
# cat /proc/spl/kstat/zfs/chksum_bench

implementation               1k      4k     16k     64k    256k      1m      4m     16m
edonr-generic               157     245     255     359    1634     339     356     357
skein-generic               118     131     132     135     618     132     136     131
sha256-generic               36      39      35      38     180      39      37      37
sha256-x64                   35      36      40      62     307      51      52      58
sha256-ssse3                 69      79      81      78     384      74      79      76
sha512-generic               47      60      47      50     342      55      63      70
sha512-x64                   59      71      74      82     479      86      90     100
blake3-generic               57      58      66      82     380      77      82      81
blake3-sse2                  97     288     301     313    1417     307     305     307
blake3-sse41                107     329     360     371    1679     367     363     365

We may leave it as it is for now. That is, without taking the governor into account.
After the PR is accepted, we need to finalize the concept and simplify all this code.
At the moment, it looks like we even need to add locks here. Let us try to avoid this.


#define AT_STARTUP 0
#define AT_BENCHMARK 1
#define AT_DONE 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove the AT_DONE state - when each reading of the benchmark file should create new statistics.

Copy link
Author

@lzsaver lzsaver Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but this is a noticeable change for the user. Let us not touch it for now. Thank you for the teamwork.

Without AT_DONE:

# time cat /proc/spl/kstat/zfs/chksum_bench > /dev/null
real 0m14.143s
user 0m0.000s
sys  0m13.771s

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but you can test then with different cpu governors.
I think this change isn't to complex.

Copy link
Author

@lzsaver lzsaver Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Let us try to do this in this PR.

Update:
We need some kind of protection against regular requests. Perhaps it would be better to leave it for later?

@lzsaver lzsaver changed the title chksum: run 256K benchmark on demand zfs_chksum: preserve chksum_stat_data between runs Nov 25, 2025
@lzsaver lzsaver marked this pull request as ready for review November 25, 2025 16:10
@github-actions github-actions bot added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Nov 25, 2025
@lzsaver
Copy link
Author

lzsaver commented Nov 25, 2025

This patch forces the 256K benchmark to be run on demand as well, which should resolve #17945.
It could be that this PR does not solve the root cause of the problem.

Now, instead of this, it preserves chksum_stat_data between runs, so the early benchmark results are not lost.

But the results look weird, even if we just restart the module, because there is a delay between computations.

# rmmod zfs spl && modprobe zfs && cat /proc/spl/kstat/zfs/chksum_bench

implementation               1k      4k     16k     64k    256k      1m      4m     16m
edonr-generic              1180    1545    1623    1630     860    1604    1579    1575
skein-generic               540     593     613     589     366     611     608     570
sha256-generic              159     177     179     178     112     179     179     153
sha256-x64                  274     300     304     305     287     304     304     143
sha256-ssse3                143     173     165     166     383     167     167     170
sha512-generic              121     137     140     145     341     144     144     143
sha512-x64                  196     235     185     230     478     235     198     186
blake3-generic              164     153     154     159     353     153     155     156
blake3-sse2                 285     704     716     659    1406     668     631     626
blake3-sse41                260     636     715     739    1661     707     658     711

@lzsaver lzsaver force-pushed the patch-2 branch 3 times, most recently from 8382088 to e9f95d1 Compare November 25, 2025 17:20
@lzsaver lzsaver requested review from amotin and mcmilk November 25, 2025 17:25
@lzsaver lzsaver force-pushed the patch-2 branch 2 times, most recently from dc96c52 to b64efbd Compare November 25, 2025 21:40
@lzsaver
Copy link
Author

lzsaver commented Nov 25, 2025

Okay. We do both: force the 256K benchmark and preserve chksum_stat_data between runs (the reason why there were zeros).

However, it should be noted that the scenario where the 256K benchmark still runs during system boot and then runs again on demand was not tested.

Now tested.

It could be that this PR does not solve the root cause of the problem.

Now solves.

@lzsaver lzsaver changed the title zfs_chksum: preserve chksum_stat_data between runs zfs_chksum: run 256K benchmark on demand Nov 25, 2025
@lzsaver lzsaver changed the title zfs_chksum: run 256K benchmark on demand chksum: run all benchmarks on demand Nov 26, 2025
@lzsaver
Copy link
Author

lzsaver commented Nov 26, 2025

Let us try to run all benchmarks on demand.

This takes some time, but the user will always have complete information to make a decision.

Update:
However, any user can make a request (even "nobody"). This may cause a DoS attack.
It seems that conceptual changes are required, which we had hoped to avoid in this PR.

Update 2:
Rolled back for now.

ZFS-CI-Type: quick
Signed-off-by: Alexx Saver <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Co-authored-by: Adam Moss <[email protected]>
Closes openzfs#17945
@lzsaver lzsaver changed the title chksum: run all benchmarks on demand chksum: run 256K benchmark on demand, preserve chksum_stat_data Nov 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Code Review Needed Ready for review and testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Faster checksum benchmark on system boot

4 participants