-
-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nut shutdown assuming dead ups #2794
Comments
Interesting situation. My guess would be that it works as designed, at least. There are things known to you and unknown to the servers, such as that the service restart was "intentional". I have little idea how to even propagate the concept (is it known to services they are being restarted and will return in a second or minute?) Perhaps upsmon could wait a cycle or two on connection loss (maybe tied to knowledge that the driver or its data server program is gracefully going down and it was not power yanked from their machine or network gear - something that could be developed e.g. around Note that knowledge of being calibrated does not necessarily allow us to discount the on-battery situation as safe. I did see UPSes losing power for the load because its earlier guess was too optimistic and calibration thought the battery was still 20-30% full. (Batteries do degrade over time; last calibration could have been too long ago, and your battery capacity and/or the fed load changed considerably in between). Regular calibration with same load probably reduces this possibility, but still... |
I think the main issue here is that the netclient thought that the ups is still in calibrating state as stated by this log line: But it was not. The last calibration was on 2025.01.29 and the service restart happened on 2025.01.30. So 1 day later. So the UPS was in a simple OL state (I know it was physically seeing it, but as I said I have not checked the nutserver to be sure it also thinks it is in OL state) when the service restart happened on the server, but since the client thought that the ups was in ST_CAL it shut down the machine. So either the server thought that the ups is still in st_cal and that state propagated to the netclient, or the server correctly thought it is on OL state but that state change somehow did not get to the client for more than a full day somehow. |
Well... either new calibration kicked in right during those seconds, or... From messages above, I assume you are running a NUT v2.8.1 release build? looking at https://github.com/networkupstools/nut/blob/v2.8.1/clients/upsmon.c#L2111-L2170 it might be that Can you try a newer package or ideally a custom build per https://github.com/networkupstools/nut/wiki/Building-NUT-for-in%E2%80%90place-upgrades-or-non%E2%80%90disruptive-tests to check that the current codebase actually does not have this practical buggy use-case, please? |
It was not calibrating at that moment that is for sure. I'll wait for the next calibration first and check the statuses on both server and client to see what they are reporting. Then I'll maybe recompile ubuntu's version with the added https://github.com/networkupstools/nut/blob/v2.8.2/clients/upsmon.c#L1234-L1242 function and https://github.com/networkupstools/nut/blob/v2.8.2/clients/upsmon.c#L2181-L2182 call and see if that helps. |
Well, IMHO monkey-patching sources like that may be a bit risky - just too easy to miss something. If you do go that route, use git blame (or github UI) to track down the commits and whole PRs that delivered the change, to reduce that particular risk. It may be that the changes relied on some other work, possibly in other source files, that would not be in your patched history and codebase though. It may be more fruitful to use a source tarball from 2.8.2 (or generate one from current master branch with The "in-place" builds as detailed on Wiki should overlay much of the packaged installation, especially of a recent one that reports its |
Hi!
I don't know if this is a bug or a misconfiguration but here is what happened.
I have 2 machines, one is
netserver
other isnetclient
.The
netserver
is connected through usb to adevice.model: Back-UPS RS 550G
with the following configBoth of the machines running
Ubuntu 24.04.1 LTS
with nut version2.8.1-3.1ubuntu2
.The ups seems to calibrate itself every two weeks and the last one was on:
Ubuntu has this function that when a lib used by a process is upgraded it restarts the corresponding systemd unit so it uses the updated lib.
Recently a bunch of libs were updated and when nut restarted on the
netserver
thenetclient
machine shut itself down.At first I did not understand what happened, so after manually starting
netclient
and checked the logs I could see this:So it seems that the
netclinet
was thinking that the ups is still in calibrating mode when thenetserver
restarted its nut systemd service, and it was assuming the ups is dead and it immediately shutdown.Of course I did not check what nut thinks about the state of the ups before the libs upgraded, but after that I could see that both
netclient
andnetserver
reports:ups.status: OL
correctly.Also yesterday and today again a bunch of libs were upgraded and nut was restarted on
netserver
and this time it did not trigger the shutdown (and I've checked this time and the status was still OL)Now.. is this a bug that somehow the
netclient
did not get the memo that the ups is not in calibrating state, or something is misconfigured?The text was updated successfully, but these errors were encountered: