-
Notifications
You must be signed in to change notification settings - Fork 1.9k
chksum: run 256K benchmark on demand, preserve chksum_stat_data #17946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
mcmilk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will fix the cosmetic issue.
But we will need to dig a bit deeper into the problem, why the bs256k variable is zero ;-)
amotin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I find it OK. I don't think saving here makes much sense if we may lose consistency. But before we merge this I think it would be good to understand the original reported problem, so that we would not hide it deeper, ending up with sub-optimal implementation selection.
mcmilk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we may rework the chksum_init() and maybe other functions.
I would try to fix this also.
|
I think zfs_chksum.c has gotten out of control, having spent too long even grokking it (including the new state machine, yay) let alone trying to find the underlying bug. 😁 So, I think the underlying bug is that the optimization in I think the real fix is to wrap this code: ... in ... but there's a lot of complication and smell here for something that optimizes a function that will normally only get called once ever. IMVHO. (If it were up to me I'd consider reverting the optimization.) |
|
Let us first get the code into a working state, worthy of the 2.4 release. No surprises. No experiments. @mcmilk, after that, you can redo it according to some new concept. @adamdmoss, thanks. I will take a look. |
ca2011b to
950eb73
Compare
|
Well, now it works. We may leave it as it is for now. That is, without taking the governor into account. |
|
|
||
| #define AT_STARTUP 0 | ||
| #define AT_BENCHMARK 1 | ||
| #define AT_DONE 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove the AT_DONE state - when each reading of the benchmark file should create new statistics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but this is a noticeable change for the user. Let us not touch it for now. Thank you for the teamwork.
Without AT_DONE:
# time cat /proc/spl/kstat/zfs/chksum_bench > /dev/null
real 0m14.143s
user 0m0.000s
sys 0m13.771s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but you can test then with different cpu governors.
I think this change isn't to complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. Let us try to do this in this PR.
Update:
We need some kind of protection against regular requests. Perhaps it would be better to leave it for later?
Now, instead of this, it preserves chksum_stat_data between runs, so the early benchmark results are not lost. But the results look weird, even if we just restart the module, because there is a delay between computations. |
8382088 to
e9f95d1
Compare
dc96c52 to
b64efbd
Compare
|
Okay. We do both: force the 256K benchmark and preserve chksum_stat_data between runs (the reason why there were zeros).
Now tested.
Now solves. |
|
Let us try to run all benchmarks on demand. This takes some time, but the user will always have complete information to make a decision. Update: Update 2: |
ZFS-CI-Type: quick Signed-off-by: Alexx Saver <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Co-authored-by: Adam Moss <[email protected]> Closes openzfs#17945
Motivation and Context
#17563 speeds up system boot by ensuring the 256K benchmark runs during boot, while others run on demand. However, for some reason, the 256K benchmark does not always run during boot. This patch forces the 256K benchmark to be run on demand as well, which should resolve #17945.
Description
One of two things must be true: either the 256K benchmark is not being run when the system boots, or the results of that benchmark are not being taken into account. If you force the 256K benchmark to run on demand, the data is displayed correctly.
It may not quite fit the concept, but it seems to solve the problem. Probably @mcmilk has some more thoughts on the topic.
However, there is an objection to the current concept. If it is true that the 256K benchmark result obtained during boot should be provided on demand, then values should become inconsistent when the governor is changed.
Probably we do not want the 256K benchmark values from the upper table to end up in the lower one.
We could try even more aggressively.
However, this is an intuitive line of reasoning. It has not been tested yet because of the issue, which is probably more important.
How Has This Been Tested?
The patch was tested on top of three current branches. If the 256K benchmark results were all zeros, they display correctly after applying the patch. However, it should be noted that the scenario where the 256K benchmark still runs during system boot and then runs again on demand was not tested. It could be that this PR does not solve the root cause of the problem. In any case, there is hope that this can be fixed without a major rewrite of the code.
Types of changes
Checklist:
Signed-off-by.