Document requirements for secure GPU-accelerated rendering #4234

DemiMarie · 2018-08-21T20:30:51Z

Qubes OS version:

R4.0

Affected component(s):

Documentation

Steps to reproduce the behavior:

Search for documentation for what exactly would be necessary for secure, hardware-accelerated rendering in untrusted AppVMs

Expected behavior:

Some documentation as to why it is not possible currently, and what would be required to change that.

Actual behavior:

No such documentation.

General notes:

One obvious use of a system like Qubes is running untrusted Steam games. Because Qubes doesn’t support hardware-accelerated rendering, this doesn’t work. It would be nice to have documentation as to why this is not possible, and what would be necessary to change this.

Related issues:

andrewdavidwong · 2018-08-22T04:02:45Z

The documentation is a community effort. Please help us improve it by submitting a pull request:

https://www.qubes-os.org/doc/doc-guidelines/

DemiMarie · 2018-09-01T16:38:16Z

@andrewdavidwong I would be happy to, but to do so I need to know what the QubesOS maintainers would require of such a solution.

andrewdavidwong · 2018-09-01T20:57:55Z

@DemiMarie:

I would be happy to, but to do so I need to know what the QubesOS maintainers would require of such a solution.

I suggest searching qubes-devel to see whether the information you need is already there and, if not, starting a thread asking specific questions.

teoman002 · 2018-10-29T09:32:30Z

A known problem why gpu acceleration in VMs is a secruity risk:
It is not possible due to VRAM Leaks of graphic cards, read this report with many pictures as a proof.
https://hsmr.cc/palinopsia/ (cited from whonix footnotes)
https://www.whonix.org/wiki/Virtualization_Platform_Security#cite_note-3

Description of the problem:
VRAM leakage enables guest operating systems to access VRAM content from Qubes dom0 area, which can be a secruity risk, if dom0 is spied on in the right time or all the time.
This can be done by allocating VRAM without initializing the content.
The reason why GPU manufacturers don't implement initialization of VRAM content might be, because it decreases either the lifespan of the memory or slows down performance, although I can't site any sources on this. It's also possible that they had no reason to do so till now.

Possible approach:
In order to change the situation one has to tell the graphic card to initialize after memory allocation. AMD has linux open source drivers, maybe a driver developer can investigate the altered behavior when such a function is implemented. But changing such a fundamental behavior could have devastating effects on the API, because everyone expect memory to behave the old way.

sajkbflksadbfkasdjf · 2022-01-05T13:20:36Z

A known problem why gpu acceleration in VMs is a secruity risk: It is not possible due to VRAM Leaks of graphic cards, read this report with many pictures as a proof. https://hsmr.cc/palinopsia/ (cited from whonix footnotes) https://www.whonix.org/wiki/Virtualization_Platform_Security#cite_note-3

Description of the problem: VRAM leakage enables guest operating systems to access VRAM content from Qubes dom0 area, which can be a secruity risk, if dom0 is spied on in the right time or all the time. This can be done by allocating VRAM without initializing the content. The reason why GPU manufacturers don't implement initialization of VRAM content might be, because it decreases either the lifespan of the memory or slows down performance, although I can't site any sources on this. It's also possible that they had no reason to do so till now.

Possible approach: In order to change the situation one has to tell the graphic card to initialize after memory allocation. AMD has linux open source drivers, maybe a driver developer can investigate the altered behavior when such a function is implemented. But changing such a fundamental behavior could have devastating effects on the API, because everyone expect memory to behave the old way.

Why would this be a blocking issue? Couldn't we construct an additional virtualisation layer (maybe just draw a white screen to the buffer) so that no additional information is exposed? It should definitely be doable and wasting one of the two most powerful chips in your computer because the virtualisation is too hard to get right seems wasteful to me.

thw0rted · 2022-06-05T11:19:06Z

If more use cases help at all: I'm a web developer writing a WebGL-based application. Some of my application's users are on a Qubes-based desktop environment. Their performance is currently terrible, and I think this issue is the reason.

Users on a physical desktop with a mid-grade dedicated GPU can get a smooth 60FPS performance with minimal load on the GPU. Even those with Intel integrated graphics from several years ago can manage a pretty stable 30FPS, as long as they're using the right drivers. Users on the Qubes platform are stuck with Chrome's "SwiftShader" pure-software renderer, which takes more than one second per frame (!).

I think this problem is only likely to become more prevalent in the future, as more applications move to the browser, and browsers become more reliant on hardware-accelerated compositing and rendering.

DemiMarie · 2022-06-05T19:37:54Z

If more use cases help at all: I'm a web developer writing a WebGL-based application. Some of my application's users are on a Qubes-based desktop environment. Their performance is currently terrible, and I think this issue is the reason.

First, thanks for letting us know. The current situation sucks, but it is the best we can do right now. Sadly, supporting it securely on cards that people can actually afford (as opposed to super-expensive enterprise cards) is still an open problem. The good news is that people are working on it; the bad news is that these efforts could take quite a while to come to fruition. If Qubes can ever ship hardware accelerated rendering on by default without violating our user’s security expectiations, we will.

Are you able to link to the web application, by any chance?

Users on the Qubes platform are stuck with Chrome's "SwiftShader" pure-software renderer, which takes more than one second per frame (!).

Yea, this sucks. Is your application particularly heavy on textures? Texture sampling is notoriously slow in software renderers; there is a reason GPUs have dedicated hardware for it. 3D transformations are also quite expensive. Simple pixel shaders should (at least in theory) be able to be handled pretty efficiently using SIMD instructions on the CPU, though Qubes disabling SMT (simultaneous multithreading AKA hyper-threading) probably doesn’t help there.

I also suggest filing a ticket with SwiftShader; you could be hitting a pathological case there. At they very least, they might have suggestions as to what part of the code is particularly expensive.

thw0rted · 2022-06-06T12:14:55Z

I don't have the Qubes-based desktop environment available right now, but I think you could get a good idea of the problem using this demo page. (Our application is proprietary, but built on CesiumJS.)

If you paste in viewer.scene.debugShowFramesPerSecond = true; just after the first line of code, then click Run (F8), it will reload the demo with a FPS counter in the corner. With an Intel HD620 (very basic iGPU, ~5-6 years old) I get reasonable framerates (15-30 FPS), but as soon as I switch to SwiftShader it's down around 1FPS.

I will look at filing a ticket with Chromium and/or Cesium, but I kind of expect each to point the finger at the other...

marmarek · 2022-06-06T12:23:41Z

With an Intel HD620 (very basic iGPU, ~5-6 years old) I get reasonable framerates (15-30 FPS), but as soon as I switch to SwiftShader it's down around 1FPS.

FWIW I get ~9FPS in Firefox and ~4FPS in Chrome. Both are bad, but one is clearly worse...

thw0rted · 2022-06-06T12:32:54Z

I'm curious, if you visit the regular public-facing Google Maps from your Qubes environment, do you feel like you have a subjectively bad experience? I think they're using similar technologies. When I force software rendering on the same laptop ("hardware acceleration" option off in Chrome settings), zooming and panning "feel" bad. Maybe that could be a more broadly applicable use case for you?

marmarek · 2022-06-06T12:45:51Z

Google Maps works reasonably fine. It isn't super smooth, I see some jumps when panning too fast, but nothing major.

tzwcfq · 2022-06-06T13:18:47Z

I've tested as well:
Intel i9-12900K - 8 P-cores and 8 E-cores
Firefox 91.10.0esr on debian-11 with 16 vcpu.
Open earth.google.com press F12 -> open Performance tab -> Start Recording Performance.
When I don't move the screen it stays at 60 fps, when I move screen around and zoom in/out the average fps is 45 and minimum is 18 fps.
When I try cesium example I'm getting 8.5 fps on average, 13 fps maximum and 4.5 fps minimum.

thw0rted · 2022-06-06T14:10:37Z

Just as a data point, I'm interested to see how that compares with Chrome / Chromium. It sounds like several others have answered "worse" but it also sounds like they may have been using older (or at least less powerful) hardware.

tzwcfq · 2022-06-06T15:03:36Z

Google Chrome Version 102.0.5005.61 (Official Build) (64-bit) on debian-11 with 16 vcpu.
To enable FPS meter open Dev Tools Console with Ctrl+Shift+I, with focus on Dev Tools press Ctrl+Shift+P and type "FPS" and press enter. It shows only average fps (or something like that, I'm not sure how it's counted).
With earth.google.com I'm getting 45 fps average if I don't move screen and around 1 fps less if I move screen around and zoom it.
With cesium I'm getting 2.8 fps if I don't move screen and 2.3 fps if I zoom in/out.

DemiMarie · 2022-07-31T03:38:07Z

With an Intel HD620 (very basic iGPU, ~5-6 years old) I get reasonable framerates (15-30 FPS), but as soon as I switch to SwiftShader it's down around 1FPS.

FWIW I get ~9FPS in Firefox and ~4FPS in Chrome. Both are bad, but one is clearly worse...

I can reproduce with a 1-vCPU qube. Chrome is at less than 0.5FPS while using 90+% CPU. Definitely time for a bug report.

thw0rted · 2022-07-31T17:33:25Z

So, just to be clear, @DemiMarie , where do you think the report should go, and who do you think should file it?

DemiMarie · 2022-07-31T20:00:34Z

@thw0rted I think you should file the report against Chromium and SwiftShader. I know basically zilch about SwiftShader internals, and it has been years since I did anything interesting with web APIs.

thw0rted · 2022-08-01T12:45:47Z

I just filed https://bugs.chromium.org/p/chromium/issues/detail?id=1348913 , in case anybody here would like to follow along. Thanks for the feedback!

marmarek · 2022-08-01T13:03:07Z

I just filed https://bugs.chromium.org/p/chromium/issues/detail?id=1348913

"Windows"?

thw0rted · 2022-08-01T13:36:23Z

Hah! Oops, sorry. It auto-populated the form from the computer I was using and I didn't think to update. It does actually apply on Windows as well, inasmuch as forcing Firefox to disable hardware rendering here still outperforms Chrome when forced to use SwiftShader. Really, the OS tag should say "any". I'll leave a comment to that effect.

JonasVautherin · 2022-08-26T00:44:17Z

Just discovered this issue, and I am curious about this comment that seems to have been ignored:

Why would this be a blocking issue? Couldn't we construct an additional virtualisation layer (maybe just draw a white screen to the buffer) so that no additional information is exposed?

Is that obviously not an option?

DemiMarie · 2022-08-26T01:28:21Z

Just discovered this issue, and I am curious about this comment that seems to have been ignored:

Why would this be a blocking issue? Couldn't we construct an additional virtualisation layer (maybe just draw a white screen to the buffer) so that no additional information is exposed?

Is that obviously not an option?

That would require an extremely complex translation layer.

rwiesbach · 2023-02-16T14:04:45Z

There was a talk on Qubes OS Summit 2022 https://www.youtube.com/watch?v=YllX-ud70Nk - what does it mean for secure GPU-accelerated rendering? Does it sacrifice security? And if so: to what extend does it sacrifice security?

DemiMarie · 2023-02-16T22:44:19Z

It doesn’t intend to, but it isn’t ready for use in Qubes yet, in part because of Xen limitations.

AlxHnr · 2023-04-15T12:07:35Z

How feasible would it be to add an unfiltered, less secure way to expose the GPU to selected VM's? Similar to how KVM/virt-manager can do it.

Why?

I trust certain VMs enough to take a risk here. Having a dedicated YouTube qube or video edit qube could save a lot of power and time. It will still provide much better isolation and sandboxing than raw Linux. Like a form of Qubes Lite, which sits between the current Qubes OS and conventional desktop systems. The only alternative here is building my own "Qubes" based on Linux + KVM, with a lot of dev effort for proper clipboarding, auto-updates, microphone and more.

DemiMarie · 2023-04-15T15:40:53Z

@marmarek thoughts?

covert8 · 2023-08-03T22:26:27Z

Has anyone considered using virtio-gpu or virgl from a security standpoint? It could also allow using the GPU across multiple cubes.

DemiMarie · 2023-08-04T01:51:41Z

@covert8 Its funny you asked, because I was about to file an issue for this!

VirGL and Venus run the userspace driver (OpenGL and Vulkan respectively) on the host. This means that they provide a hardware-independent API to the guest, but also means that the entire userspace stack becomes guest-accessable attack surface. This attack surface is very large, and Linux graphics developers have stated on IRC that it is not a security boundary. Therefore, @marmarek has decided that Qubes OS will never use VirGL or Venus and I agree with his decision.

virtGPU native contexts, on the other hand, expose the kernel ioctl API to the guest. This API is accessible to unprivileged userspace processes, which means it is a supported security boundary. It is also much smaller than the OpenGL or Vulkan APIs, which means that its attack surface is vastly smaller. As a bonus, native contexts offer near-native performance, which should be enough even for games and other demanding tasks.

The kernel ioctl API (also known as the userspace API or uAPI) is hardware-dependent, so virtGPU native contexts are only supported on a subset of hardware. Currently, Intel, AMD, and Adreno GPUs are supported using the upstream i915, amdgpu, and freedreno drivers.

Xen supports grant-based virtio, so virtio-GPU should not be incompatible with running QEMU in a stubdomain. The virtio-GPU emulator will need to run in dom0, but its job is much simpler than that of QEMU (it is only emulating a single device) and so the attack surface should (hopefully!) be acceptable.

andrewdavidwong · 2023-08-05T08:16:44Z

@DemiMarie, documentation issues like this go on the Non-release milestone, since they are independent of the Qubes OS release cycle.

ddevz · 2024-01-11T19:06:09Z

I wanted to make people aware of this related thread:

https://forum.qubes-os.org/t/seamless-gpu-passthrough-on-qubes-os-with-virtualgl/20265/18

Which talks about someone using virtualGL (which i believe is different then virGL)

(I'm assuming that the intention was to use virtualGL to communicate between a "sys-gpu" type qube to other cubes that you wanted to be able to access GL calls, however the initial post is incomplete, and no one has been able to replicate it, so the full intention is fuzzy.)

DemiMarie · 2024-02-03T22:08:39Z

Current dependency tree:

Intel virtio-GPU native contexts (under development at Intel)
AMD virtio-GPU native contexts (under development at AMD)
virtio-GPU with Xen on not-QEMU
1. virtio-GPU on Xen + QEMU (under development at AMD)
2. virtio-GPU on KVM + not-QEMU (shipping in Chromebooks)
3. Wayland everywhere.
  1. Port of virtio-GPU to not-QEMU + Xen (protocol only, no hardware acceleration required)
  2. Something to draw the borders
  3. Port various GUI stuff to use wlr-layer-shell instead of X11 override-redirect windows (@marmarta maintains this code, but I’ve agreed to provide any help needed).
  4. Central notification daemon, since VMs no longer have override-redirect windows (I’m working on this).
  5. StatusNotifierItem instead of XEmbed (I was working on this somewhat, now stalled temporarily due to Rust dependency hell).
  6. D-Bus menu implementation (probably rendered in the VM and drawn on the host via another layer-shell surface).

DemiMarie · 2024-02-05T06:16:33Z

@thw0rted One idea I just now had was to see about optimizations on the CesiumJS side. Even when a GPU is available and in use, forcing it to 100% usage cannot be good for battery life on mobile.

thw0rted · 2024-02-05T15:16:46Z

I was just a Cesium user, never on their dev team, and I've since moved on to another project.

That said, I think they already did what they could for optimization. It's a graphically-intensive 3D application, so battery drain should be treated more like running a mobile game than a regular web page. They did include an option for the application developer to trigger rendering manually, so that the render process would idle otherwise, which I think would help a lot in a mobile context.

The problem on Qubes was that even in short bursts (like scrolling a map), the software WebGL implementation was so slow that you could easily drop down to seconds-per-frame rather than frames-per-second. With the manual rendering option, you might only kick up to high power drain for 5 seconds of pan-and-zoom, then go back to idle, but with hardware acceleration those 5 seconds felt nice and smooth.

DemiMarie · 2024-02-05T20:04:48Z

Thanks for the explanation @thw0rted!

DemiMarie · 2024-02-28T01:58:08Z

Should this be closed? I think it is redundant, now that work on GPU acceleration has actually started.

rwiesbach · 2024-02-28T13:06:00Z

Should this be closed? I think it is redundant, now that work on GPU acceleration has actually started.

Has it? awesome. Is there another GithHub issue for that? (which one?!)

DemiMarie · 2024-02-28T20:04:01Z

Should this be closed? I think it is redundant, now that work on GPU acceleration has actually started.

Has it? awesome. Is there another GithHub issue for that? (which one?!)

#8552 and https://github.com/orgs/QubesOS/projects/17

github-actions · 2024-02-28T21:36:24Z

This issue has been closed as "not applicable." Here are some common examples of cases in which issues are closed as not applicable:

Help and support requests (please see Help, support, mailing lists, and forum)
Questions (please see Help, support, mailing lists, and forum)
Discussion issues (please see Help, support, mailing lists, and forum)
Bug reports for behavior that is already working as intended
Enhancement requests to improve things that are already working as intended
Issues that rest on mistaken assumptions or misunderstandings
Issues that do not provide enough information
Issues that are not actionable

We respect the time and effort you have taken to file this issue, and we understand that this outcome may be unsatisfying. Please accept our sincere apologies and know that we greatly value your participation and membership in the Qubes community.

Regarding help and support requests, please note that this issue tracker (qubes-issues) is not intended to serve as a help desk or tech support center. Instead, we've set up other venues where you can ask for help and support, ask questions, and have discussions. By contrast, the issue tracker is more of a technical tool intended to support our developers in their work. We thank you for your understanding.

If anyone reading this believes that this issue was closed in error or that the resolution of "not applicable" is not accurate, please leave a comment below saying so, and we will review this issue again. For more information, see How issues get closed.

andrewdavidwong added help wanted This issue will probably not get done in a timely fashion without help from community contributors. C: doc task labels Aug 22, 2018

andrewdavidwong added this to the Ongoing milestone Aug 22, 2018

icequbes1 mentioned this issue Dec 18, 2020

Documentation should not state that user firewall config scripts are executed at every firewall change #6291

Closed

andrewdavidwong added the P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. label Jan 8, 2022

andrewdavidwong added T: enhancement and removed T: task labels Aug 1, 2022

DemiMarie modified the milestones: Non-release, Release 4.3, Release TBD Aug 3, 2023

andrewdavidwong modified the milestones: Release TBD, Non-release Aug 5, 2023

andrewdavidwong changed the title ~~Documentation of what would be needed for secure GPU-accelerated rendering.~~ Document requirements for secure GPU-accelerated rendering Aug 5, 2023

andrewdavidwong removed this from the Non-release milestone Aug 13, 2023

andrewdavidwong added R: not applicable E.g., help/support requests, questions, discussions, "not a bug," not enough info, not actionable. and removed help wanted This issue will probably not get done in a timely fashion without help from community contributors. labels Feb 28, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 28, 2024

Document requirements for secure GPU-accelerated rendering #4234

Document requirements for secure GPU-accelerated rendering #4234

Comments

DemiMarie commented Aug 21, 2018

Qubes OS version:

Affected component(s):

Steps to reproduce the behavior:

Expected behavior:

Actual behavior:

General notes:

Related issues:

andrewdavidwong commented Aug 22, 2018

DemiMarie commented Sep 1, 2018

andrewdavidwong commented Sep 1, 2018

teoman002 commented Oct 29, 2018 • edited Loading

sajkbflksadbfkasdjf commented Jan 5, 2022

thw0rted commented Jun 5, 2022

DemiMarie commented Jun 5, 2022 • edited Loading

thw0rted commented Jun 6, 2022 • edited Loading

marmarek commented Jun 6, 2022

thw0rted commented Jun 6, 2022

marmarek commented Jun 6, 2022

tzwcfq commented Jun 6, 2022

thw0rted commented Jun 6, 2022

tzwcfq commented Jun 6, 2022

DemiMarie commented Jul 31, 2022

thw0rted commented Jul 31, 2022

DemiMarie commented Jul 31, 2022

thw0rted commented Aug 1, 2022

marmarek commented Aug 1, 2022

thw0rted commented Aug 1, 2022

JonasVautherin commented Aug 26, 2022

DemiMarie commented Aug 26, 2022

rwiesbach commented Feb 16, 2023

DemiMarie commented Feb 16, 2023

AlxHnr commented Apr 15, 2023

DemiMarie commented Apr 15, 2023

covert8 commented Aug 3, 2023 • edited Loading

DemiMarie commented Aug 4, 2023

andrewdavidwong commented Aug 5, 2023

ddevz commented Jan 11, 2024

DemiMarie commented Feb 3, 2024 • edited Loading

DemiMarie commented Feb 5, 2024

thw0rted commented Feb 5, 2024

DemiMarie commented Feb 5, 2024

DemiMarie commented Feb 28, 2024

rwiesbach commented Feb 28, 2024

DemiMarie commented Feb 28, 2024

github-actions bot commented Feb 28, 2024

teoman002 commented Oct 29, 2018 •

edited

Loading

DemiMarie commented Jun 5, 2022 •

edited

Loading

thw0rted commented Jun 6, 2022 •

edited

Loading

covert8 commented Aug 3, 2023 •

edited

Loading

DemiMarie commented Feb 3, 2024 •

edited

Loading