Re: [PATCH, RFC 45/62] mm: Add the encrypt_mprotect() system call for MKTME

From: Dave Hansen
Date: Mon Jun 17 2019 - 14:32:59 EST


Tom Lendacky, could you take a look down in the message to the talk of
SEV? I want to make sure I'm not misrepresenting what it does today.
...


>> I actually don't care all that much which one we end up with. It's not
>> like the extra syscall in the second options means much.
>
> The benefit of the second one is that, if sys_encrypt is absent, it
> just works. In the first model, programs need a fallback because
> they'll segfault of mprotect_encrypt() gets ENOSYS.

Well, by the time they get here, they would have already had to allocate
and set up the encryption key. I don't think this would really be the
"normal" malloc() path, for instance.

>> How do we
>> eventually stack it on top of persistent memory filesystems or Device
>> DAX?
>
> How do we stack anonymous memory on top of persistent memory or Device
> DAX? I'm confused.

If our interface to MKTME is:

fd = open("/dev/mktme");
ptr = mmap(fd);

Then it's hard to combine with an interface which is:

fd = open("/dev/dax123");
ptr = mmap(fd);

Where if we have something like mprotect() (or madvise() or something
else taking pointer), we can just do:

fd = open("/dev/anything987");
ptr = mmap(fd);
sys_encrypt(ptr);

Now, we might not *do* it that way for dax, for instance, but I'm just
saying that if we go the /dev/mktme route, we never get a choice.

> I think that, in the long run, we're going to have to either expand
> the core mm's concept of what "memory" is or just have a whole
> parallel set of mechanisms for memory that doesn't work like memory.
...
> I expect that some day normal memory will be able to be repurposed as
> SGX pages on the fly, and that will also look a lot more like SEV or
> XPFO than like the this model of MKTME.

I think you're drawing the line at pages where the kernel can manage
contents vs. not manage contents. I'm not sure that's the right
distinction to make, though. The thing that is important is whether the
kernel can manage the lifetime and location of the data in the page.

Basically: Can the kernel choose where the page comes from and get the
page back when it wants?

I really don't like the current state of things like with SEV or with
KVM direct device assignment where the physical location is quite locked
down and the kernel really can't manage the memory. I'm trying really
hard to make sure future hardware is more permissive about such things.
My hope is that these are a temporary blip and not the new normal.

> So, if we upstream MKTME as anonymous memory with a magic config
> syscall, I predict that, in a few years, it will be end up inheriting
> all downsides of both approaches with few of the upsides. Programs
> like QEMU will need to learn to manipulate pages that can't be
> accessed outside the VM without special VM buy-in, so the fact that
> MKTME pages are fully functional and can be GUP-ed won't be very
> useful. And the VM will learn about all these things, but MKTME won't
> really fit in.

Kai Huang (who is on cc) has been doing the QEMU enabling and might want
to weigh in. I'd also love to hear from the AMD folks in case I'm not
grokking some aspect of SEV.

But, my understanding is that, even today, neither QEMU nor the kernel
can see SEV-encrypted guest memory. So QEMU should already understand
how to not interact with guest memory. I _assume_ it's also already
doing this with anonymous memory, without needing /dev/sme or something.

> And, one of these days, someone will come up with a version of XPFO
> that could actually be upstreamed, and it seems entirely plausible
> that it will be totally incompatible with MKTME-as-anonymous-memory
> and that users of MKTME will actually get *worse* security.

I'm not following here. XPFO just means that we don't keep the direct
map around all the time for all memory. If XPFO and
MKTME-as-anonymous-memory were both in play, I think we'd just be
creating/destroying the MKTME-enlightened direct map instead of a
vanilla one.