Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Wayland instead of X11 to increase performance and improve security #3366

Open
Tracked by #8552
artemist opened this issue Dec 5, 2017 · 27 comments
Open
Tracked by #8552
Labels
bounty This issue has a public bounty associated with it. C: gui-virtualization P: major Priority: major. Between "default" and "critical" in severity. release notes This issue should be mentioned in the release notes. S: partial Status: partial. Work on this issue is partially complete, but it is not actively being worked on.

Comments

@artemist
Copy link

artemist commented Dec 5, 2017

Although this is not a security issue due to the guid security model, there are several advantages to using Wayland instead of X11:

Advantages

Higher performance

If allocations are on page boundaries, then we can use xc_map_foreign_rage (or the equivalent in the HAL) to map framebuffer pages directly from the client in the VM to the compositor in the guivm

Lower memory usage

Since framebuffers are mapped instead of copied, the proxy wayland compositor should use less memory than xorg (On a VM which currently has 800M of RAM and two windows, Xorg is using 1/6th of the physical memory)

Easier GPU acceleration support

AFAIR, a lot of OpenGL operations are preformed within the X server through the X OpenGL extensions. Simply forwarding these commands to the guivm would be dangerous, so we would need to process within the Xorg server then send the displaylist sometime before the end of processing and rendering. With Wayland graphics processing happens within the context of the application, and only a framebuffer is shared to the compositor. This means that we can simply attach GVT-g or comparable hardware graphics virtualuization to VMs without complex modifications to guid.

Multiple dpi support

Wayland allows one to attach multiple displays with different densities, which is important for people with HiDPI laptops who want to use external displays. We can simply forward events for screen update to the client, although we have to deal with anonymity for anon-whonix, where position of multiple displays could be very revealing.

Method

Wayland has two communication methods; Commands over a Unix socket, and shared memory buffers through a file descriptor with mmap. Commands, including shared memory setup and keyboard input, should be proxied through a client in the guivm and a stub compositor in the appvm. However, wl_shm::create_pool and wl_shm events should be intercepted so that the stub compositor and guivm wayland client both create file descriptors in their VMs, and the guivm maps a foreign range (or asks dom0 to do so, I'm not sure quite how that would work) to link together the contents of those two memory ranges.

Doing this

I am starting work on forwarding Wayland between VMs. I would be interested in working on this for Google Summer of Code if the Qubes project decides to join.

@andrewdavidwong andrewdavidwong added this to the Far in the future milestone Dec 6, 2017
@jpouellet
Copy link
Contributor

Not to rain on the wayland parade, but I'm not convinced the potential benefit over the current system is as large as you portray.

If allocations are on page boundaries, then we can use xc_map_foreign_rage (or the equivalent in the HAL) to map framebuffer pages directly from the client in the VM to the compositor in the guivm

The current gui protocol/implementation already does have guests blit directly to a shared-memory framebuffer not requiring any copying between VMs. What exactly would Wayland improve about this?

This means that we can simply attach GVT-g or comparable hardware graphics virtualuization to VMs without complex modifications to guid.

I believe this is highly unlikely to happen. The security risk is just too high IMO.

All rendering in the guests happen in software, and IMO that's very unlikely to change unless GPUs get proper memory protection so e.g. shaders can be mutually isolated in different address spaces, enforced in hardware.

  • The GVT-g approach of "just try to arbitrate everything in software" strongly reminds one of Xen paravirtualization, which we've moved away from in R4 because it's proven too hard to get right and became a liability.
  • Other approaches which somehow result in at least some kind of indirect hw acceleration like Virgil 3d (translate/emulate shader IL) is a graphics-analog of QEMU (in full instruction emulation mode no less!), which Qubes has explicitly architected around not trusting.

IMO it's way too complex to be even worth considering from a security standpoint.

Even just yesterday's OS X security advisory had 3 new CVEs for their intel graphics driver interface, allowing sandbox escapes & privilege escalation. I haven't seen any technical write-ups yet, but I'm willing to bet there are still plenty more holes in that interface.

I would be interested in working on this for Google Summer of Code if the Qubes project decides to join.

And I am interested in being a GSoC mentor for Qubes again. I'm definitely in no position to make any promises about this project, but I look forward to seeing a proposal and your patches in general :)

@marmarek
Copy link
Member

As @jpouellet said, benefits may not be that large. But this could be still useful thing to do. Xorg and X11 protocol in general is quite complex and from time to time we hit some strange interactions between different toolkits and our GUI. Wayland could make things easier here. So, 👍 from me, including GSoC 2018 (we will apply this year too).

@artemist
Copy link
Author

Thanks! Even with the problems @jpouellet mentioned, I think that there still could be be some advantages.

A few thoughts I wanted to write down so I don't forget:

The main reason I wanted to start this in the first place was multiple DPI support, and that could be useful, although we have to deal with privacy concerns.

I think we could still reduce RAM usage by sharing the same memory for the framebuffer in the client in the AppVM, the stub compositor in the AppVM, the stub client in the GuiVM, and the real compositor in the GuiVM. It may also be possible to do this in X11 with proper proxying of MIT-SHM, but I can't find any code doing it, and doing so may increase complexity significantly. (I may also just be misunderstanding X Display Lists though). Shared memory does open us up to easy cache attacks, but I can't think of any one can do based off of a framebuffer, especially since one does not generally draw directly onto it because of double buffering, IIRC. Nevertheless, I will have to look into how much the GuiVM is trusted, and if cache attacks originating from it would be a concern.

We can remove GVT-g from the picture: I thought it used newer isolation features since my laptop didn't support it, but I guess not. Further research does show it is basically PV. However, It still may make graphics acceleration with GPU passthrough easier, as there is no need to mess with X11 graphics extensions, only OpenGL/CL libraries. It looks like NVIDIA and AMD also have some interesting (SR-IOV for AMD) isolation features for fancier GPUs, although those seem really really expensive and only easily available on certain servers.

@jpouellet
Copy link
Contributor

It may also be possible to do this in X11 with proper proxying of MIT-SHM

It is my understanding that that is already how things are done. I refer you to https://www.qubes-os.org/doc/gui/#window-content-updates-implementation

but I can't find any code doing it

Some pointers:

Nevertheless, I will have to look into how much the GuiVM is trusted

IIUC it is ultimately trusted by necessity

@jpouellet
Copy link
Contributor

Nevertheless, I will have to look into how much the GuiVM is trusted

IIUC it is ultimately trusted by necessity

That is to say, the GuiVM is obviously necessarily in the TCB of any VM which it controls input to / sees output from. Currently we only have one GuiVM (dom0) which must already be ultimately trusted and already has full access to everything anyway. However, down the road it is desirable to move the window manager out of dom0 and remove its ability to control dom0 (and in certain use cases perhaps also remove its ability to control some other VMs managed by an external admin).

@ghost
Copy link

ghost commented Jan 12, 2018

Wouldn't using wayland increase the security of xscreensaver too?

@artemist
Copy link
Author

@blacklight447 Yes, screen lockers are harder to crash in Wayland.

However, that reminds me of another problem: Screen lockers, like the rest of the compositor, are all part of the same window manager process. This means that we may have to make significant changes to each desktop environment. At minimum, it would just be to have coloured decorations. I think KDE, GNOME, and Sway (i3 clone) support server-side decorations, so it shouldn't be too bad.

@marmarek
Copy link
Member

marmarek commented Jan 12, 2018

I think KDE, GNOME, and Sway (i3 clone) support server-side decorations, so it shouldn't be too bad.

I hope it is true. But at least for GNOME, there is big push to client-side decorations, so I'm not so sure about it.

That is to say, the GuiVM is obviously necessarily in the TCB of any VM which it controls input to / sees output from.

Clarification: theoretically GuiVM may not have full control over input. It may be reduced to only controlling input focus. But in the first version it probably will have full control.

@DemiMarie
Copy link

DemiMarie commented Apr 8, 2018

As far as graphics acceleration, modern GPUs do have an MMU that can enforce page protection. The problem is arbitrating access to it between VMs. I can think of a few solutions:

  1. Do not expose the MMU to VMs — attempts to modify the MMU from a VM are trapped and ignored.

  2. Trap-and-emulate (shadow page tables). Too complex? Seems to me to be similar to virtualizing a CPU without SLAT.

  3. Paravirtualization. We only need to handle rendering commands (nothing else makes sense for a VM to do). My understanding is that that is just buffer management — everything else is handled in hardware.

    This seems simple — not more complicated than Xen’s own management of CPU memory, or a kernel’s management of mmap’d buffers. Linux has had many vulnerabilities, but none in the mmap code, if I understand correctly.

  4. On twin-GPU systems, where one GPU is not connected to any display, we can give that GPU to a VM entirely, relying on the IOMMU to prevent access to GPU-internal registers and firmware. This presumes that those are not in the GPU’s address space.

    While obviously suboptimal, this approach works fantastically in one (very important, IMO) use case: gaming.

Of these, 3 and 4 seem the most promising to me. The API for 3 sounds (deceptively?) small:

// A handle to a GPU buffer
typedef int gpu_buffer_t;

// Get a buffer, or -1 on error
int gpu_mmap(uint64_t size);

// The mapping mode
enum gpu_mode_t {
    RO, RW, WO,
};
// Map the buffer, returning its GPU address in *addr
int gpu_map(gpu_mode_t mode, int handle, uint64_t *addr);

// Unmap the buffer
int gpu_unmap(int handle);

// Destroy the buffer
int gpu_free(int handle);

Of course, these are just ideas, and I could be completely and utterly wrong.

@Hello71
Copy link

Hello71 commented Aug 19, 2018

Screen lockers, like the rest of the compositor, are all part of the same window manager process.

From what I understand, this is true in "standard" Wayland, but there is a wlroots protocol extension, "input inhibitor", that allows the screen locker to operate as a separate process. On sway, swaylock is a completely separate program from the main compositor.

The API for 3 sounds (deceptively?) small:

I believe this API already exists, it is called "DMA-BUF".

http://phd.mupuf.org/files/fosdem2013_drinext_drm2.pdf specifically references Qubes, so I would hope that security has been a legitimate consideration in the new API development.

@DemiMarie
Copy link

Also, it seems that modern drivers already virtualize the GPU, with isolation enforced either in hardware or software. Modern GPUs support both, so one could use hardware isolation between VMs, and software isolation within a VM.

@thearthur
Copy link

I'm waiting for this one to try out Cubes OS. I understand this will be a long wait. Just wanted to say hi. 👋

@edrex
Copy link

edrex commented Mar 13, 2020

https://spectrum-os.org/ is a project to build a compartmentalized OS on crosvm, nixos, and wayland, still early days but really exciting.

@DemiMarie DemiMarie modified the milestones: TBD, Release 4.2 Nov 26, 2020
@DemiMarie DemiMarie added the P: major Priority: major. Between "default" and "critical" in severity. label Nov 26, 2020
@marmarek marmarek added the release notes This issue should be mentioned in the release notes. label Nov 26, 2020
@DemiMarie DemiMarie self-assigned this Jan 23, 2021
@DemiMarie
Copy link

@marmarek: How much will the GUI protocol need to change? Can XWayland be used as a transitional option, if shmoverride is applied to the Wayland compositor too?

@DemiMarie
Copy link

One major advantage of Wayland is that Wayland subsurfaces can be mapped by the GUIVM and composited on the GPU. This should be much more efficient (both in CPU usage and power consumption) than CPU-side compositing by the X server, but requires caution to ensure that a client cannot draw outside of what Qubes OS considers the borders of its window.

@marmarek
Copy link
Member

marmarek commented Jun 16, 2021 via email

@Geblaat
Copy link

Geblaat commented Apr 19, 2022

Wayland functionality for Spectrum OS will be integrated into upstream Wayland, which might be interesting for Qubes OS:
https://spectrum-os.org/lists/hyperkitty/list/[email protected]/thread/3VYGG3QLV37IJDQL3SZZMTOTJ5ZZKZFL/

@hexagonrecursion
Copy link

There is now a bounty for this issue https://app.bountysource.com/issues/52352776-use-wayland-instead-of-x11-to-increase-performance

@andrewdavidwong andrewdavidwong added the bounty This issue has a public bounty associated with it. label May 5, 2022
@iacore
Copy link

iacore commented Jul 7, 2022

I found this Wayland/X11 nested compositor from ChromiumOS:
https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/vm_tools/sommelier/

X11 Sommelier
An X11 sommelier instance provides X11 forwarding. Xwayland is used to accomplish this. A single X11 sommelier instance is typically shared across all X11 clients as they often expect that they can use a shared X server for communication. If the X11 sommelier instance crashes in this setup, it takes all running X11 programs down with it. Multiple X11 sommelier instances can be used for improved isolation or when per-client configuration is needed, but it will be at the cost of losing the ability for programs to use the X server for communication between each other.

Seems like it can be used as X11 compositor as well, and can replace current qubes-gui and qubes-guid. It seem to also support different seats (for gaming/ game controllers).

@DemiMarie
Copy link

I found this Wayland/X11 nested compositor from ChromiumOS: https://chromium.googlesource.com/chromiumos/platform2/+/HEAD/vm_tools/sommelier/

X11 Sommelier
An X11 sommelier instance provides X11 forwarding. Xwayland is used to accomplish this. A single X11 sommelier instance is typically shared across all X11 clients as they often expect that they can use a shared X server for communication. If the X11 sommelier instance crashes in this setup, it takes all running X11 programs down with it. Multiple X11 sommelier instances can be used for improved isolation or when per-client configuration is needed, but it will be at the cost of losing the ability for programs to use the X server for communication between each other.

Seems like it can be used as X11 compositor as well, and can replace current qubes-gui and qubes-guid. It seem to also support different seats (for gaming/ game controllers).

I recommend against Sommelier. It is written in C++ and Thomas Leonard found that it kept crashing for him. His own proxy (written in OCaml) is probably a better choice.

@DemiMarie DemiMarie added S: partial Status: partial. Work on this issue is partially complete, but it is not actively being worked on. S: in progress Status: in progress. The assignee is currently working on this issue. and removed S: partial Status: partial. Work on this issue is partially complete, but it is not actively being worked on. labels Feb 7, 2023
@andrewdavidwong andrewdavidwong removed this from the Release 4.2 milestone Aug 13, 2023
@bi0shacker001
Copy link

Do we have any updates on the status of this? It looks like Hardware Acceleration is blocked by it, and it will also enable autorotate on convertibles, which would be very useful for my use-case.

@DemiMarie
Copy link

The current plan is to replace the GUI agent with wayland-proxy-virtwl, which will be connected via a Rust program to an instance of crosvm running on the host. crosvm will then proxy this via another instance of wayland-proxy-virtwl to the host compositor.

@kravemir
Copy link

kravemir commented Jun 8, 2024

The move to Wayland would be great, because Wayland has official Fractional scale protocol.

Nowadays, monitors come with quite odd PPI's, and integer HiDPI scaling isn't particularly usable anymore. With 200% scaling everything is too big, but on 100% everything is too small.

Besides the odd PPI of modern displays, fractional scaling helps with accessibility - some people scale normal DPI displays to 125%.


I'm wanting to try and use QubesOS, as I want to separate personal stuff, hobby work and client work, and QubesOS is perfect fit. However, both - my laptop's screen and desktop monitor - need 150~160% scaling, and 1x or 2x scaling makes it practically unusable (ergonomically, long-term). So, this is quite a no-go blocker for me at the moment.

@runephilosof
Copy link

@kravemir I can have been using custom scaling to 0.9 on a standard install of Qubes OS for a while, without any problems. I am unsure what problems I should be expecting.

@DemiMarie DemiMarie changed the title Use Wayland instead of X11 to increase performance Use Wayland instead of X11 to increase performance and improve security Sep 20, 2024
@kravemir
Copy link

I am unsure what problems I should be expecting.

Noticeable blurriness at scaling of 130% to 170%. Because then it's rendered at 100% and scaled up, or at 200% and scaled down. Or, rendered at 100% and displayed at 100% without scaling.

Well, unless it's E2E integration, and applications know, that they should render at 130% scaling, meaning 1950x1300 frame-buffer for window to render on the display, that is representing 1500x1000 of virtual working screen real estate.

I can have been using custom scaling to 0.9 on a standard install of Qubes OS for a while, without any problems.

The downscaling might not be noticeable at 90% scaling.

Sounds too good to be true. I have to give it a try, whether it works also with scaling I need for my displays with odd PPI(s).

@bruceloco
Copy link

There is also https://github.com/labwc/labwc which is a great project that even the RPI is using as a compositor and XFCE has some support for it.
https://www.xfce.org/about/tour420
Or use something like the Pixel desktop environment which is a modified LXDE to support wayland.

@DemiMarie DemiMarie removed their assignment Dec 22, 2024
@DemiMarie
Copy link

XFCE support for Wayland is still experimental, and XFWM does not support Wayland at all. Therefore, Qubes OS should not ship XFCE+Wayland as a default configuration.

@andrewdavidwong andrewdavidwong added S: partial Status: partial. Work on this issue is partially complete, but it is not actively being worked on. and removed S: in progress Status: in progress. The assignee is currently working on this issue. labels Dec 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounty This issue has a public bounty associated with it. C: gui-virtualization P: major Priority: major. Between "default" and "critical" in severity. release notes This issue should be mentioned in the release notes. S: partial Status: partial. Work on this issue is partially complete, but it is not actively being worked on.
Projects
None yet
Development

No branches or pull requests