Re: [RFCv1 7/7] KVM: unmap guest memory using poisoned pages
From: Dave Hansen
Date: Tue Apr 06 2021 - 10:33:08 EST
On 4/6/21 12:44 AM, David Hildenbrand wrote:
> On 02.04.21 17:26, Kirill A. Shutemov wrote:
>> TDX architecture aims to provide resiliency against confidentiality and
>> integrity attacks. Towards this goal, the TDX architecture helps enforce
>> the enabling of memory integrity for all TD-private memory.
>>
>> The CPU memory controller computes the integrity check value (MAC) for
>> the data (cache line) during writes, and it stores the MAC with the
>> memory as meta-data. A 28-bit MAC is stored in the ECC bits.
>>
>> Checking of memory integrity is performed during memory reads. If
>> integrity check fails, CPU poisones cache line.
>>
>> On a subsequent consumption (read) of the poisoned data by software,
>> there are two possible scenarios:
>>
>> - Core determines that the execution can continue and it treats
>> poison with exception semantics signaled as a #MCE
>>
>> - Core determines execution cannot continue,and it does an unbreakable
>> shutdown
>>
>> For more details, see Chapter 14 of Intel TDX Module EAS[1]
>>
>> As some of integrity check failures may lead to system shutdown host
>> kernel must not allow any writes to TD-private memory. This requirment
>> clashes with KVM design: KVM expects the guest memory to be mapped into
>> host userspace (e.g. QEMU).
>
> So what you are saying is that if QEMU would write to such memory, it
> could crash the kernel? What a broken design.
IMNHO, the broken design is mapping the memory to userspace in the first
place. Why the heck would you actually expose something with the MMU to
a context that can't possibly meaningfully access or safely write to it?
This started with SEV. QEMU creates normal memory mappings with the SEV
C-bit (encryption) disabled. The kernel plumbs those into NPT, but when
those are instantiated, they have the C-bit set. So, we have mismatched
mappings. Where does that lead? The two mappings not only differ in
the encryption bit, causing one side to read gibberish if the other
writes: they're not even cache coherent.
That's the situation *TODAY*, even ignoring TDX.
BTW, I'm pretty sure I know the answer to the "why would you expose this
to userspace" question: it's what QEMU/KVM did alreadhy for
non-encrypted memory, so this was the quickest way to get SEV working.
So, I don't like the #MC either. But, this series is a step in the
right direction for TDX *AND* SEV.