-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fan cycling indifinetly #28
Comments
Which hwmon input is producing those numbers? There was some discussion about blacklisting in #25, but I'd be intrigued to find out which sensor is actually producing these values, especially since it's not static in this case. |
I don't know which ones, how do I find out ? |
Try the (attached) script - maybe it can help. Output looks like this: Directory: /sys/class/hwmon/hwmon1/ Directory: /sys/class/hwmon/hwmon2/ Directory: /sys/class/hwmon/hwmon3/ Directory: /sys/class/hwmon/hwmon4/ Directory: /sys/class/hwmon/hwmon5/ Directory: /sys/class/hwmon/hwmon6/ Directory: /sys/class/hwmon/hwmon7/ (Hu - seem like /sys/class/hwmon/hwmon6/temp2_input is not readable on my system...) |
Here is the output
|
I suspect I have the same issue. There seems to be a mysterious high temperature being detected that isn't shown anywhere else. For example:
checking the sensors around the same time as the last entry shows:
or via
Nothing seems to be close to the ~70 degree temp being reported. Do you have any ideas where it might be coming from? edit: After watching the output of the script given above, I think what happens is there are very short temperature spikes. I'm not sure if they're real or errors in the sensor data. Observing just the temp1 of the cpu: while true; do date && cat /sys/class/hwmon/hwmon7/temp1_input; sleep 0.1; done; for less than a second the temp seems to jump up and down several degrees:
If zcfan notices this jump it'll often kick the fan into a higher mode for a while. But it doesn't seem necessary to do so. I wonder if it should keep a running 1-2 second average to avoid spikes? |
Hey @stefancircuit Wow, that is strange. Are you saying you measured a temp difference from 67 to 81, within 100ms? Were you running any specific load on the system at the time? What is your hwmon7's name? (Lots of docs suggest the number of the "hwmon" sensors can jump between boots... I don't think I've observed that however). Anyway - if your hwmon7 is your wifi... it might not be covered by the CPU cooling heatpipes... like mine (T16 G1 AMD): https://laptopmedia.com/wp-content/uploads/2022/08/internals-1000x711.jpg Sooo, if that is the case, you may have a similar problem to what I had - needing to be able to blacklist a sensor from zcfan, from it's fan-management algorithm, which just takes the highest temperature of any sensor, and uses that to set the fan speed. See: #25 I've reached a dead end with my issue... In the end I modified zcfan to hardcode an exclusion on that one sensor on my laptop, and that solved the algo issue, but, it exposed an new issue... that is - when setting the desired fan level by writing to /proc/acpi/ibm/fan - the whole system may crash. At random. It is actually something I observed with zcfan and my experimentation with it... I thought it was my own bad code (but how... zcfan is in userspace...) - but later I wrote my own "zcfan" in python to read sensors, accommodate sensor blacklisting, compute brackets, and set the fan level accordingly. I don't know how to troubleshoot this further. I'm guessing this issue might be unique to the combination of the IBM ACPI driver and my motherboard/bios - else... zcfan would not for work anyone... Actually, the original problem I had, that lead me to try zcfan, was that the auto fan speed control of the system had an issue... Most of the time it would be fine, but then sometimes it would get into a loop of spinning up and down, over and over, quite fast. Probably up and down, in about 5 seconds. Over and over. Even if no load on the system. I think fundamentally, the IBM ACPI driver, which does run in kernel-space, is not that good, and it leads to these issues we've seen-
Hmmm, come to think of it, the fluctuating fan speed issue I had, is kinda similar to what you and @B0ndo2 reported, but maybe at a faster pace? maybe it has to do with the polling frequency set up in zcfan... maybe its fundamentally driven by the same issue. I'm not sure how to troubleshoot this issue further, or where to go for help. @stefancircuit - have you had any random system crashes while experimenting with zcfan? If not - you can try that sensor blacklisting route... or if your hwmon7 temp1 is some part of your CPU/GPU, and you do want it to drive your fan speed... maybe you can pre-process those readings via a moving average or a low-pass-filter or something like that to smooth out the bumps... Last question - which version of the IBM ACPI driver are you running? I'm on 0.26 Thanks! |
Hi thanks for your response @rudolf81 . I was not running any particular load, just idling.
My assumption was that I don't get any crashes just these (possibly spurious) sub-second temperature spikes that push the fan speed up randomly. The acpi version is as follows:
But yeah, it seems like a rolling average would help smooth out anomalies. That being said I'm currently running Ubuntu 22, and I tried 24 over the weekend which seems to just fix the problems. I'm not sure why, or what changed but the temps are down and the fan stays mostly off without extra tools. This is a very new laptop model (P1 Gen7) so maybe it just needs whatever mysterious packages exist in the newer OS. 🤷♂️ I'll still keep using zcfan for a bit as I cannot upgrade fully yet, but hopefully that will be the longer term solution. |
Interesting. You would think that the temperatures you get, via querying the sysft or via procfs, somehow come directly from the actual sensors of the components. Updating some OS packages, are not likely to alter what those readings are?? (...unless they already have some smoothing algo applied? but... you'd think that would be a concern 1 layer above - not from the actual sensors themselves...) Anyway - when you get back into Ubuntu 24 - would be awesome if you can share the version of Thinkpad ACPI Extras driver. Thanks. |
From discussion it feels like this is in the vein of #25, so closing so we can discuss there. Thanks! |
I have a ThinkPad T14s gen 3 where I installed zcfan. The fan is cycling like crazy. I am monitoring the CPU and GPU temperature and they never reached 70 or 61. I also feel that the fun runs at high speed always
Mar 13 16:21:58 XX zcfan[11590]: [FAN] Temperature now 63C, fan set to low
Mar 13 16:22:26 XX zcfan[11590]: [FAN] Temperature now 50C, fan set to off
Mar 13 16:24:52 XX zcfan[11590]: [FAN] Temperature now 76C, fan set to medium
Mar 13 16:24:56 XX zcfan[11590]: [FAN] Temperature now 47C, fan set to off
Mar 13 16:25:45 XX zcfan[11590]: [FAN] Temperature now 70C, fan set to low
Mar 13 16:25:49 XX zcfan[11590]: [FAN] Temperature now 48C, fan set to off
Mar 13 16:25:59 XX zcfan[11590]: [FAN] Temperature now 61C, fan set to low
Mar 13 16:26:02 XX zcfan[11590]: [FAN] Temperature now 48C, fan set to off
Mar 13 16:26:22 XX zcfan[11590]: [FAN] Temperature now 61C, fan set to low
Mar 13 16:26:25 XX zcfan[11590]: [FAN] Temperature now 47C, fan set to off
zcfan.conf
max_temp 85
med_temp 75
low_temp 60
The text was updated successfully, but these errors were encountered: