Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zsysd crashes if user dataset mountpoint is changed, canmount set to off #212

Open
AlexeyGusev opened this issue Aug 11, 2021 · 1 comment

Comments

@AlexeyGusev
Copy link

The problem

zsysd fails to start, crashes with segmentation fault, if one of the datasets that it manages has snapshots and the following manipulations have been performed (in example below, consider dataset rpool/USERDATA/ag_zt20jc, which was originally created with mountpoint /home/ag):

  1. change mount point for dataset. New mountpoint is set to /home/ag.legacy.
  2. unmount dataset and set dataset canmount option to off
  3. create new dataset. In given example, rpool/USERDATA/ag_xyz with mountpoint /home/ag, encryption enabled and canmount option set to noauto. Options com.ubuntu.zsys:bootfs-datasets, com.ubuntu.zsys:last-used and my.pam.automount:user (for pam-assisted mount of encrypted dataset) are set accordingly.

After reboot, new dataset is mounted properly and old dataset is not mounted, as expected:

NAME                                              CANMOUNT  MOUNTPOINT                ENCRYPTION   MOUNTED
rpool/USERDATA/ag_xyz                             noauto    /home/ag                  aes-256-gcm  yes
rpool/USERDATA/ag_zt20jc                          off       /home/ag.legacy           off          no

However, zsysd crashes with sefgault, which is not expected:

Aug 11 09:12:14 cirrus systemd[1]: Starting ZSYS daemon service...
Aug 11 09:12:14 cirrus zsysd[9407]: level=warning msg="Didn't find origin \"rpool/USERDATA/ag_zt20jc\" for \"rpool/USERDATA/ag_zt20jc@autozsys_e4ja6i\" matching any dataset"
…
<all snapshots from rpool/USERDATA/ag_zt20jc are listed here>
…
Aug 11 09:12:14 cirrus zsysd[9407]: panic: runtime error: invalid memory address or nil pointer dereference
Aug 11 09:12:14 cirrus zsysd[9407]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x92192f]
Aug 11 09:12:14 cirrus zsysd[9407]: goroutine 1 [running]:
Aug 11 09:12:14 cirrus zsysd[9407]: github.com/ubuntu/zsys/internal/machines.(*Machines).refresh(0xc0000ddc20, 0xb4e170, 0xc000024070)
Aug 11 09:12:14 cirrus zsysd[9407]: #011github.com/ubuntu/zsys/internal/machines/machines.go:227 +0xf2f
Aug 11 09:12:14 cirrus zsysd[9407]: github.com/ubuntu/zsys/internal/machines.New(0xb4e170, 0xc000024070, 0xc00011a600, 0x79, 0xc00014da48, 0x1, 0x1, 0x0, 0x0, 0x0, ...)
Aug 11 09:12:14 cirrus zsysd[9407]: #011github.com/ubuntu/zsys/internal/machines/machines.go:145 +0x83d
Aug 11 09:12:14 cirrus zsysd[9407]: github.com/ubuntu/zsys/internal/daemon.New(0x0, 0x0, 0x0, 0x0, 0x0, 0x671cce, 0xc0001327e0, 0xb42840)
Aug 11 09:12:14 cirrus zsysd[9407]: #011github.com/ubuntu/zsys/internal/daemon/daemon.go:115 +0x469
Aug 11 09:12:14 cirrus zsysd[9407]: github.com/ubuntu/zsys/cmd/zsysd/daemon.glob..func2(0xe7e720, 0xef6818, 0x0, 0x0)
Aug 11 09:12:14 cirrus zsysd[9407]: #011github.com/ubuntu/zsys/cmd/zsysd/daemon/zsysd.go:30 +0x88
Aug 11 09:12:14 cirrus zsysd[9407]: github.com/ubuntu/zsys/vendor/github.com/spf13/cobra.(*Command).execute(0xe7e720, 0xc000020240, 0x0, 0x0, 0xe7e720, 0xc000020240)
Aug 11 09:12:14 cirrus zsysd[9407]: #011github.com/ubuntu/zsys/vendor/github.com/spf13/cobra/command.go:830 +0x2c2
Aug 11 09:12:14 cirrus zsysd[9407]: github.com/ubuntu/zsys/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xe7e720, 0xb, 0x7fff6a5a4e8b, 0x5)
Aug 11 09:12:14 cirrus zsysd[9407]: #011github.com/ubuntu/zsys/vendor/github.com/spf13/cobra/command.go:914 +0x30b
Aug 11 09:12:14 cirrus zsysd[9407]: github.com/ubuntu/zsys/vendor/github.com/spf13/cobra.(*Command).Execute(...)
Aug 11 09:12:14 cirrus zsysd[9407]: #011github.com/ubuntu/zsys/vendor/github.com/spf13/cobra/command.go:864
Aug 11 09:12:14 cirrus zsysd[9407]: main.main()
Aug 11 09:12:14 cirrus zsysd[9407]: #011github.com/ubuntu/zsys/cmd/zsysd/main.go:36 +0xdf
Aug 11 09:12:14 cirrus systemd[1]: zsysd.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 11 09:12:14 cirrus systemd[1]: zsysd.service: Failed with result 'exit-code'.
Aug 11 09:12:14 cirrus systemd[1]: Failed to start ZSYS daemon service.

since zsysd is dead, zsysctl complains that it cannot reach out to zsys service, in syslog:

connection error: desc = \"transport: Error while dialing dial unix /run/zsysd.sock: connect: connection refused\""

zsys seemingly links snapshots to parent dataset erroneously (according to com.ubuntu.zsys:mountpoint?), which seems to be wrong, effectively bringing problems once the dataset mountpoint is changed and another dataset with given mountpoint is created.

Workaround

As a workaround, destroy all relevant snapshots: zfs list -H -o name -t snapshot -r rpool/USERDATA/ag_zt20jc | sudo xargs -n1 zfs destroy. After the snapshots are gone, sudo systemctl start zsysd completes successfully.

Expected behavior:

  1. zsysd service shall not crash, even if it thinks there are orphan snapshots (in fact not) that it manages.
  2. zsys shall link snapshots to their parent dataset properly, even if the mounpoint is changed/reassigned to a different dataset.

System configuration:

$ uname -a
Linux cirrus 5.11.0-25-generic #27-Ubuntu SMP Fri Jul 9 23:06:29 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=21.04
DISTRIB_CODENAME=hirsute
DISTRIB_DESCRIPTION="Ubuntu 21.04"
$ zsysctl version
zsysctl	0.5.8
zsysd	0.5.8
@rhymeswithmogul
Copy link

Thank you for the one-liner! My bpool filled up and screwed things up beyond belief (despite being on v0.5.9, which supposedly fixed this).

$ uname -a
Linux MY-MACHINE-NAME-HERE 5.15.0-37-generic #39-Ubuntu SMP Wed Jun 1 19:16:45 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04 LTS"
$ zsysctl version
zsysctl	0.5.9
ERROR couldn't connect to zsys daemon: connection error: desc = "transport: Error while dialing dial unix /run/zsysd.sock: connect: connection refused"

(zsys being unresponsive happened as soon as bpool filled up. I see a separate ticket open for that.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants