Re: Redoing eXclusive Page Frame Ownership (XPFO) with isolated CPUs in mind (for KVM to isolate its guests per CPU)

From: Linus Torvalds
Date: Mon Aug 20 2018 - 17:49:05 EST


On Mon, Aug 20, 2018 at 2:26 PM Konrad Rzeszutek Wilk
<konrad.wilk@xxxxxxxxxx> wrote:
>
> See eXclusive Page Frame Ownership (https://lwn.net/Articles/700606/) which was posted
> way back in in 2016..

Ok, so my gut feel is that the above was reasonable within the context
of 2016, but that the XPFO model is completely pointless and wrong in
the post-Meltdown world we now live in.

Why?

Because with the Meltdown patches, we ALREADY HAVE the isolated page
tables that XPFO tries to do.

They are just the normal user page tables.

So don't go around doing other crazy things.

All you need to do is to literally:

- before you enter VMX mode, switch to the user page tables

- when you exit, switch back to the kernel page tables

don't do anything else. You're done.

Now, this is complicated a bit by the fact that in order to enter VMX
mode with the user page tables, you do need to add the VMX state
itself to those user page tables (and add the actual trampoline code
to the vmenter too).

So it does imply we need to slightly extend the user mapping with a
few new patches, but that doesn't sound bad.

In fact, it sounds absolutely trivial to me.

The other thing you want to do is is the trivial optimization of "hey.
we exited VMX mode due to a host interrupt", which would look like
this:

* switch to user page tables in order to do vmenter
* vmenter
* host interrupt happens
- switch to kernel page tables to handle irq
- do_IRQ etc
- switch back to user page tables
- iret
* switch to kernel page tables because the vmenter returned

so you want to have some trivial short-circuiting of that last "switch
to user page tables and back" dance. It may actually be that we don't
even need it, because the irq code may just be looking at what *mode*
we were in, not what page tables we were in. I looked at that code
back in the meltdown days, but that's already so last-year now that we
have all these _other_ CPU bugs we handled.

But other than small details like that, doesn't this "use our Meltdown
user page table" sound like the right thing to do?

And note: no new VM code or complexity. None. We already have the
"isolated KVM context with only pages for the KVM process" case
handled.

Of course, after the long (and entirely unrelated) discussion about
the TLB flushing bug we had, I'm starting to worry about my own
competence, and maybe I'm missing something really fundamental, and
the XPFO patches do something else than what I think they do, or my
"hey, let's use our Meltdown code" idea has some fundamental weakness
that I'm missing.

Linus