Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long roundtrip times (which makes wgpu/Vulkan surface creation particularly slow) #827

Open
ids1024 opened this issue Sep 6, 2024 · 5 comments

Comments

@ids1024
Copy link
Member

ids1024 commented Sep 6, 2024

When creating a new window wgpu::Surface::configure spends an excessive amount of time in vkGetPhysicalDeviceSurfacePresentModesKHR, vkGetPhysicalDeviceSurfaceFormatsKHR, vkCreateSwapchainKHR. As far as I can tell https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30014 doesn't help much here.

The time this takes is quite variable, and depends on overall lag in the compositor. Like creating a stack slows it down.

This is partly an issue in Mesa (and the design of VK_KHR_wayland_surface, which doesn't share a "display" for multiple surfaces), but if we could improve the compositors response to get_registry, binding the dmabuf global, etc. we could improve that. Though that may not really be possible while handlers are depend on global &mut State (not sure if Smithay/smithay#1384 could help with that).

@ids1024
Copy link
Member Author

ids1024 commented Jan 21, 2025

time cosmic-term --no-daemon -e exit seems like a good benchmark for this.

Testing in cosmic-comp, I see 0m0.315s without wgpu, 0m1.578s with it.

On Sway I see 0m0.108s without wgpu, 0m0.246s with it.

So it's worse with Wgpu, which is consistent with this being related to the compositor roundtrips. The performance without wgpu may also be from roundtrips, but needing fewer of them. (Wgpu needs to get a registry and wait for globals, softbuffer needs to do the same, etc.)

@ids1024
Copy link
Member Author

ids1024 commented Jan 21, 2025

More minimal test:

use std::time::Instant;
use wayland_client::{
    globals::{registry_queue_init, GlobalListContents},
    protocol::wl_registry,
    Connection, Dispatch, QueueHandle,
};

struct State;

fn main() {
    let time = Instant::now();
    let conn = Connection::connect_to_env().unwrap();
    registry_queue_init::<State>(&conn).unwrap();
    println!("{:?}", time.elapsed());
}

impl Dispatch<wl_registry::WlRegistry, GlobalListContents> for State {
    fn event(
        _: &mut State,
        _: &wl_registry::WlRegistry,
        _: wl_registry::Event,
        _: &GlobalListContents,
        _: &Connection,
        _: &QueueHandle<State>,
    ) {
    }
}

sway: 374.368µs
anvil: 394.504µs
cosmic: 21.591699ms

So taking about 50x as long as it should, presumably for all round trips. And Vulkan/wgpu/iced exacerbates this by requiring more of them than some other things do.

@ids1024 ids1024 changed the title Excessive time spent creating new Vulkan surfaces Long roundtrip times (which makes wgpu/Vulkan surface creation particularly slow) Jan 21, 2025
@ids1024
Copy link
Member Author

ids1024 commented Jan 21, 2025

In a simple Vulkan test creating a surface and swapchain from a Wayland surface, I see 10 get_registry calls (one of which is the one I created to create a Wayland surface). That's without even trying to actually present something. vkgears shoulds me 12, perhaps from resizing the surface.

@ids1024
Copy link
Member Author

ids1024 commented Jan 21, 2025

In a freshly started cosmic-comp, it takes 359.231µs, similar to other compositors. So it's the number of clients, the things it needs to render, how long the compositor has been running, or such that slows it down.

@Drakulix
Copy link
Member

One educated guess would be our idle-callback, which could probably be triggered after every forced sync as this would clear the wayland queue at least temporarily. That does a bunch re-computing of state in there, that we could either:

  • Try to eliminate by calling these code paths only when we know something has changed. (Computing that in the respective protocol/input handlers isn't always straight forward and can also lead to excessive re-computations while further state-altering events are still inbound.)
  • Try to move these onto a fixed timer and some dirty flag to get rid of this relatively expensive computation on every roundtrip?

Note: None of this is confirmed to be the issue, but given this depends on clients it seems likely to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants