Skip to content

Conversation

@marmarek
Copy link
Member

@marmarek marmarek commented Dec 2, 2024

Better handle stopping the service while it's still starting up. And also improve handling Xorg startup errors (much simpler alternative to #176).

In case the service is stopped while Xorg is still starting up (and
gui-agent still waits for the Xorg connectin in mkghandles), gui-agent
would exit before killing Xorg and Xorg would try connecting back to the
gui-agent forever, delaying the shutdown.

Fix this by moving signal registration earlier, before Xorg startup.
Since ghandles_for_vchan_reinitialize is now set before its fully
initialized, initialize x_pid field explicitly and leave all the other
fields zeroed (instead of random stack rubble).
Register proper signal handler for SIGCHLD, and collect the Xorg's
zombie in it.

This has two effects:
1. The main loop can explicitly exit on Xorg termination, not only via
   receiving EOF on the socket.
2. Due to not ignoring SIGCHLD anymore, accept() in mkghandles will also
   notice Xorg early exit and not wait indefinitely (it will fail with
   EINTR). For this case, improve error message.

There is still a small race on startup, if Xorg exits before reaching
accept() (or listen()) call. Handle this by checking just before
accept() call. It isn't perfect (there is still a few instructions
window where it wouldn't notice it in time), but it's good enough for
practical purposes.

QubesOS/qubes-issues#8060
Use X's logging function instead of plain perror, to ensure the message
is written in appropriate Xorg's log.
@qubesos-bot
Copy link

qubesos-bot commented Dec 3, 2024

OpenQA test summary

Complete test suite and dependencies: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2024121004-4.3&flavor=pull-requests

Test run included the following:

New failures, excluding unstable

Compared to: https://openqa.qubes-os.org/tests/overview?distri=qubesos&version=4.3&build=2024111705-4.3&flavor=update

  • system_tests_extra

    • TC_00_QVCTest_whonix-workstation-17: test_010_screenshare (failure)
      ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^... AssertionError: 0 == 0
  • system_tests_kde_gui_interactive

    • gui_keyboard_layout: wait_serial (wait serial expected)
      # wait_serial expected: "echo -e '[Layout]\nLayoutList=us,de' | sud...

Failed tests

4 failures
  • system_tests_extra

    • TC_00_QVCTest_whonix-gateway-17: test_010_screenshare (failure)
      ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^... AssertionError: 0 == 0

    • TC_00_QVCTest_whonix-workstation-17: test_010_screenshare (failure)
      ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^... AssertionError: 0 == 0

  • system_tests_kde_gui_interactive

    • gui_keyboard_layout: wait_serial (wait serial expected)
      # wait_serial expected: "echo -e '[Layout]\nLayoutList=us,de' | sud...

    • gui_keyboard_layout: Failed (test died)
      # Test died: command 'test "$(cd ~user;ls e1*)" = "$(qvm-run -p wor...

Fixed failures

Compared to: https://openqa.qubes-os.org/tests/119126#dependencies

2 fixed
  • system_tests_audio@hw1

  • system_tests_basic_vm_qrexec_gui_zfs

    • switch_pool: Failed (test died)
      # Test died: command 'dnf install -y ./zfs-release.rpm' failed at /...

Unstable tests

Details
  • system_tests_audio

    TC_20_AudioVM_PipeWire_fedora-40-xfce/test_260_audio_mic_enabled_switch_audiovm (1/5 times with errors)
    • job 117586 AssertionError: too short audio, expected 10s, got 0.00013605442176...
  • system_tests_audio@hw1

    TC_20_AudioVM_PipeWire_fedora-40-xfce/test_260_audio_mic_enabled_switch_audiovm (1/5 times with errors)
    • job 117586 AssertionError: too short audio, expected 10s, got 0.00013605442176...

If Xorg is going to be terminated, do not try to connect to gui-agent
anymore. This avoids infinite loop when handling SIGTERM, and properly
shutdown instead.
@marmarek marmarek merged commit 7b37186 into QubesOS:main Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants