Re: [RFC PATCH 00/13] XOM for KVM guest userspace

From: Edgecombe, Rick P
Date: Fri Oct 04 2019 - 16:10:02 EST


On Fri, 2019-10-04 at 07:56 -0700, Andy Lutomirski wrote:
> On Thu, Oct 3, 2019 at 2:38 PM Rick Edgecombe
> <rick.p.edgecombe@xxxxxxxxx> wrote:
> >
> > This patchset enables the ability for KVM guests to create execute-only (XO)
> > memory by utilizing EPT based XO permissions. XO memory is currently
> > supported
> > on Intel hardware natively for CPU's with PKU, but this enables it on older
> > platforms, and can support XO for kernel memory as well.
>
> The patchset seems to sometimes call this feature "XO" and sometimes
> call it "NR". To me, XO implies no-read and no-write, whereas NR
> implies just no-read. Can you please clarify *exactly* what the new
> bit does and be consistent?
>
> I suggest that you make it NR, which allows for PROT_EXEC and
> PROT_EXEC|PROT_WRITE and plain PROT_WRITE. WX is of dubious value,
> but I can imagine plain W being genuinely useful for logging and for
> JITs that could maintain a W and a separate X mapping of some code.
> In other words, with an NR bit, all 8 logical access modes are
> possible. Also, keeping the paging bits more orthogonal seems nice --
> we already have a bit that controls write access.

Sorry, yes the behavior of this bit needs to be documented a lot better. I will
definitely do this for the next version.

To clarify, since the EPT permissions in the XO/NR range are executable, and not
readable or writeable the new bit really means XO, but only when NX is 0 since
the guest page tables are being checked as well. When NR=1, W=1, and NX=0, the
memory is still XO.

NR was picked over XO because as you say. The idea is that it can be defined
that in the case of KVM XO, NR and writable is not a valid combination, like
writeable but not readable is defined as not valid for the EPT.

I *think* whenever NX=1, NR=1 it should be similar to not present in that it
can't be used for anything or have its translation cached. I am not 100% sure on
the cached part and was thinking of just making the "spec" that the translation
caching behavior is undefined. I can look into this if anyone thinks we need to
know. In the current patchset it shouldn't be possible to create this
combination.

Since write-only memory isn't supported in EPT we can't do the same trick to
create a new HW permission. But I guess if we emulate it, we could make the new
bit mean just NR, and support write-only by allowing emulation when KVM gets a
write EPT violations to NR memory. It might still be useful for the JIT case you
mentioned, or a shared memory mailbox. On the other hand, userspace might be
surprised to encounter that memory is different speeds depending on the
permission. I also wonder if any userspace apps are asking for just PROT_WRITE
and expecting readable memory.

Thanks,

Rick