Re: [RFC PATCH v2] x86/arch_prctl: Add ARCH_SET_XCR0 to set XCR0 per-thread

From: Keno Fischer
Date: Tue Apr 14 2020 - 16:15:06 EST


Hi everyone,

I'd like to continue this discussion along two directions:

1) In this patch, what should happen to signal frames?

I continue to think that it would be good for these to observe
the process' XCR0, but I understand the argument that we
should not let the XCR0 setting modify any kernel behavior
whatsoever. Andy, I would in particular appreciate your views
on this since I believe you thought it should do the latter.

2) What would a solution based on the raw KVM API look like?

I'm still afraid that going down the KVM route would just end up
back in the same situation as we're in right now, but I'd like to
explore this further, so here's my current thinking: Particularly for
recording, the process does need to look very much like a regular
linux process, so we can get recording of syscalls and signal state right.
I don't have enough of an intuition for the performance implications
of this. For example, suppose we added a way for the kernel to
directly take syscalls from guest CPL3 - what would the cost
of incurring a vmexit for every syscall be? I suppose another
idea would be to build a minimal linux kernel that sits in guest
CPL0 and emulates at least the process state and other high
frequency syscalls, but forwards the rest to the host kernel.
Seems potentially doable, but a bit brittle - is there prior art
here I should be aware of, e.g. from people looking at securing
containers? As I mentioned, I had looked at Project Dune
before (http://dune.scs.stanford.edu/), which does seem to
do a lot of the things I would need, though it doesn't appear
to currently be handling signals at all, and of course it's also
not really KVM based, but rather
KVM-but-copy-pasted-and-manually-hacked-up-in-a-separate.ko
based.

I may also be missing a completely obvious way to do this -
my apologies if so. I would certainly appreciate any insight on
how to achieve the set of requirements here (multiple tracees
with potentially differing XCR0 values, faithful and performant
provision of syscalls/signals to the tracees) on top of KVM.

If we can figure out a good way forward with KVM, I'd be quite
interested in it, since I think there may be additional performance
games that could be played by having part of rr be in guest CPL0,
I'm just unsure that KVM is really the right abstraction here, so
I'd like to think through it a bit.

Keno