Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document requirements for secure GPU-accelerated rendering #4234

Closed
DemiMarie opened this issue Aug 21, 2018 · 38 comments
Closed

Document requirements for secure GPU-accelerated rendering #4234

DemiMarie opened this issue Aug 21, 2018 · 38 comments
Labels
C: doc P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. R: not applicable E.g., help/support requests, questions, discussions, "not a bug," not enough info, not actionable.

Comments

@DemiMarie
Copy link

Qubes OS version:

R4.0

Affected component(s):

Documentation


Steps to reproduce the behavior:

Search for documentation for what exactly would be necessary for secure, hardware-accelerated rendering in untrusted AppVMs

Expected behavior:

Some documentation as to why it is not possible currently, and what would be required to change that.

Actual behavior:

No such documentation.

General notes:

One obvious use of a system like Qubes is running untrusted Steam games. Because Qubes doesn’t support hardware-accelerated rendering, this doesn’t work. It would be nice to have documentation as to why this is not possible, and what would be necessary to change this.


Related issues:

@andrewdavidwong andrewdavidwong added help wanted This issue will probably not get done in a timely fashion without help from community contributors. C: doc task labels Aug 22, 2018
@andrewdavidwong andrewdavidwong added this to the Ongoing milestone Aug 22, 2018
@andrewdavidwong
Copy link
Member

The documentation is a community effort. Please help us improve it by submitting a pull request:

https://www.qubes-os.org/doc/doc-guidelines/

@DemiMarie
Copy link
Author

@andrewdavidwong I would be happy to, but to do so I need to know what the QubesOS maintainers would require of such a solution.

@andrewdavidwong
Copy link
Member

@DemiMarie:

I would be happy to, but to do so I need to know what the QubesOS maintainers would require of such a solution.

I suggest searching qubes-devel to see whether the information you need is already there and, if not, starting a thread asking specific questions.

@teoman002
Copy link

teoman002 commented Oct 29, 2018

A known problem why gpu acceleration in VMs is a secruity risk:
It is not possible due to VRAM Leaks of graphic cards, read this report with many pictures as a proof.
https://hsmr.cc/palinopsia/ (cited from whonix footnotes)
https://www.whonix.org/wiki/Virtualization_Platform_Security#cite_note-3

Description of the problem:
VRAM leakage enables guest operating systems to access VRAM content from Qubes dom0 area, which can be a secruity risk, if dom0 is spied on in the right time or all the time.
This can be done by allocating VRAM without initializing the content.
The reason why GPU manufacturers don't implement initialization of VRAM content might be, because it decreases either the lifespan of the memory or slows down performance, although I can't site any sources on this. It's also possible that they had no reason to do so till now.

Possible approach:
In order to change the situation one has to tell the graphic card to initialize after memory allocation. AMD has linux open source drivers, maybe a driver developer can investigate the altered behavior when such a function is implemented. But changing such a fundamental behavior could have devastating effects on the API, because everyone expect memory to behave the old way.

@sajkbflksadbfkasdjf
Copy link

A known problem why gpu acceleration in VMs is a secruity risk: It is not possible due to VRAM Leaks of graphic cards, read this report with many pictures as a proof. https://hsmr.cc/palinopsia/ (cited from whonix footnotes) https://www.whonix.org/wiki/Virtualization_Platform_Security#cite_note-3

Description of the problem: VRAM leakage enables guest operating systems to access VRAM content from Qubes dom0 area, which can be a secruity risk, if dom0 is spied on in the right time or all the time. This can be done by allocating VRAM without initializing the content. The reason why GPU manufacturers don't implement initialization of VRAM content might be, because it decreases either the lifespan of the memory or slows down performance, although I can't site any sources on this. It's also possible that they had no reason to do so till now.

Possible approach: In order to change the situation one has to tell the graphic card to initialize after memory allocation. AMD has linux open source drivers, maybe a driver developer can investigate the altered behavior when such a function is implemented. But changing such a fundamental behavior could have devastating effects on the API, because everyone expect memory to behave the old way.

Why would this be a blocking issue? Couldn't we construct an additional virtualisation layer (maybe just draw a white screen to the buffer) so that no additional information is exposed? It should definitely be doable and wasting one of the two most powerful chips in your computer because the virtualisation is too hard to get right seems wasteful to me.

@andrewdavidwong andrewdavidwong added the P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. label Jan 8, 2022
@thw0rted
Copy link

thw0rted commented Jun 5, 2022

If more use cases help at all: I'm a web developer writing a WebGL-based application. Some of my application's users are on a Qubes-based desktop environment. Their performance is currently terrible, and I think this issue is the reason.

Users on a physical desktop with a mid-grade dedicated GPU can get a smooth 60FPS performance with minimal load on the GPU. Even those with Intel integrated graphics from several years ago can manage a pretty stable 30FPS, as long as they're using the right drivers. Users on the Qubes platform are stuck with Chrome's "SwiftShader" pure-software renderer, which takes more than one second per frame (!).

I think this problem is only likely to become more prevalent in the future, as more applications move to the browser, and browsers become more reliant on hardware-accelerated compositing and rendering.

@DemiMarie
Copy link
Author

DemiMarie commented Jun 5, 2022

If more use cases help at all: I'm a web developer writing a WebGL-based application. Some of my application's users are on a Qubes-based desktop environment. Their performance is currently terrible, and I think this issue is the reason.

First, thanks for letting us know. The current situation sucks, but it is the best we can do right now. Sadly, supporting it securely on cards that people can actually afford (as opposed to super-expensive enterprise cards) is still an open problem. The good news is that people are working on it; the bad news is that these efforts could take quite a while to come to fruition. If Qubes can ever ship hardware accelerated rendering on by default without violating our user’s security expectiations, we will.

Are you able to link to the web application, by any chance?

Users on the Qubes platform are stuck with Chrome's "SwiftShader" pure-software renderer, which takes more than one second per frame (!).

Yea, this sucks. Is your application particularly heavy on textures? Texture sampling is notoriously slow in software renderers; there is a reason GPUs have dedicated hardware for it. 3D transformations are also quite expensive. Simple pixel shaders should (at least in theory) be able to be handled pretty efficiently using SIMD instructions on the CPU, though Qubes disabling SMT (simultaneous multithreading AKA hyper-threading) probably doesn’t help there.

I also suggest filing a ticket with SwiftShader; you could be hitting a pathological case there. At they very least, they might have suggestions as to what part of the code is particularly expensive.

@thw0rted
Copy link

thw0rted commented Jun 6, 2022

I don't have the Qubes-based desktop environment available right now, but I think you could get a good idea of the problem using this demo page. (Our application is proprietary, but built on CesiumJS.)

If you paste in viewer.scene.debugShowFramesPerSecond = true; just after the first line of code, then click Run (F8), it will reload the demo with a FPS counter in the corner. With an Intel HD620 (very basic iGPU, ~5-6 years old) I get reasonable framerates (15-30 FPS), but as soon as I switch to SwiftShader it's down around 1FPS.

I will look at filing a ticket with Chromium and/or Cesium, but I kind of expect each to point the finger at the other...

@marmarek
Copy link
Member

marmarek commented Jun 6, 2022

With an Intel HD620 (very basic iGPU, ~5-6 years old) I get reasonable framerates (15-30 FPS), but as soon as I switch to SwiftShader it's down around 1FPS.

FWIW I get ~9FPS in Firefox and ~4FPS in Chrome. Both are bad, but one is clearly worse...

@thw0rted
Copy link

thw0rted commented Jun 6, 2022

I'm curious, if you visit the regular public-facing Google Maps from your Qubes environment, do you feel like you have a subjectively bad experience? I think they're using similar technologies. When I force software rendering on the same laptop ("hardware acceleration" option off in Chrome settings), zooming and panning "feel" bad. Maybe that could be a more broadly applicable use case for you?

@marmarek
Copy link
Member

marmarek commented Jun 6, 2022

Google Maps works reasonably fine. It isn't super smooth, I see some jumps when panning too fast, but nothing major.

@tzwcfq
Copy link

tzwcfq commented Jun 6, 2022

I've tested as well:
Intel i9-12900K - 8 P-cores and 8 E-cores
Firefox 91.10.0esr on debian-11 with 16 vcpu.
Open earth.google.com press F12 -> open Performance tab -> Start Recording Performance.
When I don't move the screen it stays at 60 fps, when I move screen around and zoom in/out the average fps is 45 and minimum is 18 fps.
When I try cesium example I'm getting 8.5 fps on average, 13 fps maximum and 4.5 fps minimum.

@thw0rted
Copy link

thw0rted commented Jun 6, 2022

Just as a data point, I'm interested to see how that compares with Chrome / Chromium. It sounds like several others have answered "worse" but it also sounds like they may have been using older (or at least less powerful) hardware.

@tzwcfq
Copy link

tzwcfq commented Jun 6, 2022

Google Chrome Version 102.0.5005.61 (Official Build) (64-bit) on debian-11 with 16 vcpu.
To enable FPS meter open Dev Tools Console with Ctrl+Shift+I, with focus on Dev Tools press Ctrl+Shift+P and type "FPS" and press enter. It shows only average fps (or something like that, I'm not sure how it's counted).
With earth.google.com I'm getting 45 fps average if I don't move screen and around 1 fps less if I move screen around and zoom it.
With cesium I'm getting 2.8 fps if I don't move screen and 2.3 fps if I zoom in/out.

@DemiMarie
Copy link
Author

With an Intel HD620 (very basic iGPU, ~5-6 years old) I get reasonable framerates (15-30 FPS), but as soon as I switch to SwiftShader it's down around 1FPS.

FWIW I get ~9FPS in Firefox and ~4FPS in Chrome. Both are bad, but one is clearly worse...

I can reproduce with a 1-vCPU qube. Chrome is at less than 0.5FPS while using 90+% CPU. Definitely time for a bug report.

@thw0rted
Copy link

So, just to be clear, @DemiMarie , where do you think the report should go, and who do you think should file it?

@DemiMarie
Copy link
Author

@thw0rted I think you should file the report against Chromium and SwiftShader. I know basically zilch about SwiftShader internals, and it has been years since I did anything interesting with web APIs.

@thw0rted
Copy link

thw0rted commented Aug 1, 2022

I just filed https://bugs.chromium.org/p/chromium/issues/detail?id=1348913 , in case anybody here would like to follow along. Thanks for the feedback!

@marmarek
Copy link
Member

marmarek commented Aug 1, 2022

@thw0rted
Copy link

thw0rted commented Aug 1, 2022

Hah! Oops, sorry. It auto-populated the form from the computer I was using and I didn't think to update. It does actually apply on Windows as well, inasmuch as forcing Firefox to disable hardware rendering here still outperforms Chrome when forced to use SwiftShader. Really, the OS tag should say "any". I'll leave a comment to that effect.

@JonasVautherin
Copy link

Just discovered this issue, and I am curious about this comment that seems to have been ignored:

Why would this be a blocking issue? Couldn't we construct an additional virtualisation layer (maybe just draw a white screen to the buffer) so that no additional information is exposed?

Is that obviously not an option?

@DemiMarie
Copy link
Author

Just discovered this issue, and I am curious about this comment that seems to have been ignored:

Why would this be a blocking issue? Couldn't we construct an additional virtualisation layer (maybe just draw a white screen to the buffer) so that no additional information is exposed?

Is that obviously not an option?

That would require an extremely complex translation layer.

@rwiesbach
Copy link

There was a talk on Qubes OS Summit 2022 https://www.youtube.com/watch?v=YllX-ud70Nk - what does it mean for secure GPU-accelerated rendering? Does it sacrifice security? And if so: to what extend does it sacrifice security?

@DemiMarie
Copy link
Author

It doesn’t intend to, but it isn’t ready for use in Qubes yet, in part because of Xen limitations.

@AlxHnr
Copy link

AlxHnr commented Apr 15, 2023

How feasible would it be to add an unfiltered, less secure way to expose the GPU to selected VM's? Similar to how KVM/virt-manager can do it.

Why?

I trust certain VMs enough to take a risk here. Having a dedicated YouTube qube or video edit qube could save a lot of power and time. It will still provide much better isolation and sandboxing than raw Linux. Like a form of Qubes Lite, which sits between the current Qubes OS and conventional desktop systems. The only alternative here is building my own "Qubes" based on Linux + KVM, with a lot of dev effort for proper clipboarding, auto-updates, microphone and more.

@DemiMarie
Copy link
Author

@marmarek thoughts?

@covert8
Copy link

covert8 commented Aug 3, 2023

Has anyone considered using virtio-gpu or virgl from a security standpoint? It could also allow using the GPU across multiple cubes.

@DemiMarie DemiMarie modified the milestones: Non-release, Release 4.3, Release TBD Aug 3, 2023
@DemiMarie
Copy link
Author

@covert8 Its funny you asked, because I was about to file an issue for this!

VirGL and Venus run the userspace driver (OpenGL and Vulkan respectively) on the host. This means that they provide a hardware-independent API to the guest, but also means that the entire userspace stack becomes guest-accessable attack surface. This attack surface is very large, and Linux graphics developers have stated on IRC that it is not a security boundary. Therefore, @marmarek has decided that Qubes OS will never use VirGL or Venus and I agree with his decision.

virtGPU native contexts, on the other hand, expose the kernel ioctl API to the guest. This API is accessible to unprivileged userspace processes, which means it is a supported security boundary. It is also much smaller than the OpenGL or Vulkan APIs, which means that its attack surface is vastly smaller. As a bonus, native contexts offer near-native performance, which should be enough even for games and other demanding tasks.

The kernel ioctl API (also known as the userspace API or uAPI) is hardware-dependent, so virtGPU native contexts are only supported on a subset of hardware. Currently, Intel, AMD, and Adreno GPUs are supported using the upstream i915, amdgpu, and freedreno drivers.

Xen supports grant-based virtio, so virtio-GPU should not be incompatible with running QEMU in a stubdomain. The virtio-GPU emulator will need to run in dom0, but its job is much simpler than that of QEMU (it is only emulating a single device) and so the attack surface should (hopefully!) be acceptable.

@andrewdavidwong
Copy link
Member

@DemiMarie, documentation issues like this go on the Non-release milestone, since they are independent of the Qubes OS release cycle.

@andrewdavidwong andrewdavidwong modified the milestones: Release TBD, Non-release Aug 5, 2023
@andrewdavidwong andrewdavidwong changed the title Documentation of what would be needed for secure GPU-accelerated rendering. Document requirements for secure GPU-accelerated rendering Aug 5, 2023
@andrewdavidwong andrewdavidwong removed this from the Non-release milestone Aug 13, 2023
@ddevz
Copy link

ddevz commented Jan 11, 2024

I wanted to make people aware of this related thread:

https://forum.qubes-os.org/t/seamless-gpu-passthrough-on-qubes-os-with-virtualgl/20265/18

Which talks about someone using virtualGL (which i believe is different then virGL)

(I'm assuming that the intention was to use virtualGL to communicate between a "sys-gpu" type qube to other cubes that you wanted to be able to access GL calls, however the initial post is incomplete, and no one has been able to replicate it, so the full intention is fuzzy.)

@DemiMarie
Copy link
Author

DemiMarie commented Feb 3, 2024

Current dependency tree:

  1. Intel virtio-GPU native contexts (under development at Intel)
  2. AMD virtio-GPU native contexts (under development at AMD)
  3. virtio-GPU with Xen on not-QEMU
    1. virtio-GPU on Xen + QEMU (under development at AMD)
    2. virtio-GPU on KVM + not-QEMU (shipping in Chromebooks)
    3. Wayland everywhere.
      1. Port of virtio-GPU to not-QEMU + Xen (protocol only, no hardware acceleration required)
      2. Something to draw the borders
      3. Port various GUI stuff to use wlr-layer-shell instead of X11 override-redirect windows (@marmarta maintains this code, but I’ve agreed to provide any help needed).
      4. Central notification daemon, since VMs no longer have override-redirect windows (I’m working on this).
      5. StatusNotifierItem instead of XEmbed (I was working on this somewhat, now stalled temporarily due to Rust dependency hell).
      6. D-Bus menu implementation (probably rendered in the VM and drawn on the host via another layer-shell surface).

@DemiMarie
Copy link
Author

@thw0rted One idea I just now had was to see about optimizations on the CesiumJS side. Even when a GPU is available and in use, forcing it to 100% usage cannot be good for battery life on mobile.

@thw0rted
Copy link

thw0rted commented Feb 5, 2024

I was just a Cesium user, never on their dev team, and I've since moved on to another project.

That said, I think they already did what they could for optimization. It's a graphically-intensive 3D application, so battery drain should be treated more like running a mobile game than a regular web page. They did include an option for the application developer to trigger rendering manually, so that the render process would idle otherwise, which I think would help a lot in a mobile context.

The problem on Qubes was that even in short bursts (like scrolling a map), the software WebGL implementation was so slow that you could easily drop down to seconds-per-frame rather than frames-per-second. With the manual rendering option, you might only kick up to high power drain for 5 seconds of pan-and-zoom, then go back to idle, but with hardware acceleration those 5 seconds felt nice and smooth.

@DemiMarie
Copy link
Author

Thanks for the explanation @thw0rted!

@DemiMarie
Copy link
Author

Should this be closed? I think it is redundant, now that work on GPU acceleration has actually started.

@rwiesbach
Copy link

Should this be closed? I think it is redundant, now that work on GPU acceleration has actually started.

Has it? awesome. Is there another GithHub issue for that? (which one?!)

@DemiMarie
Copy link
Author

Should this be closed? I think it is redundant, now that work on GPU acceleration has actually started.

Has it? awesome. Is there another GithHub issue for that? (which one?!)

#8552 and https://github.com/orgs/QubesOS/projects/17

@andrewdavidwong andrewdavidwong added R: not applicable E.g., help/support requests, questions, discussions, "not a bug," not enough info, not actionable. and removed help wanted This issue will probably not get done in a timely fashion without help from community contributors. labels Feb 28, 2024
Copy link

This issue has been closed as "not applicable." Here are some common examples of cases in which issues are closed as not applicable:

We respect the time and effort you have taken to file this issue, and we understand that this outcome may be unsatisfying. Please accept our sincere apologies and know that we greatly value your participation and membership in the Qubes community.

Regarding help and support requests, please note that this issue tracker (qubes-issues) is not intended to serve as a help desk or tech support center. Instead, we've set up other venues where you can ask for help and support, ask questions, and have discussions. By contrast, the issue tracker is more of a technical tool intended to support our developers in their work. We thank you for your understanding.

If anyone reading this believes that this issue was closed in error or that the resolution of "not applicable" is not accurate, please leave a comment below saying so, and we will review this issue again. For more information, see How issues get closed.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: doc P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. R: not applicable E.g., help/support requests, questions, discussions, "not a bug," not enough info, not actionable.
Projects
None yet
Development

No branches or pull requests