Re: [RFC v2-fix-v4 1/1] x86/tdx: Skip WBINVD instruction for TDX guest
From: Andy Lutomirski
Date: Wed Jun 09 2021 - 12:12:32 EST
On 6/9/21 8:09 AM, Dan Williams wrote:
> On Tue, Jun 8, 2021 at 9:27 PM Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote:
>>
>>
>> here is no resume path.
>>
>>> Host is free to go into S3 independent of any guest state.
>>
>> Actually my understanding is that none of the systems which support TDX
>> support S3. S3 has been deprecated for a long time.
>
> Ok, I wanted to imply any power state that might power-off caches.
>
>>
>>
>>> A hostile
>>> host is free to do just enough cache management so that it can resume
>>> from S3 while arranging for TDX guest dirty data to be lost. Does a
>>> TDX guest go fatal if the cache loses power?
>>
>> That would be a machine check, and yes it would be fatal.
>
> Sounds good, so incorporating this and Andy's feedback:
>
> "TDX guests, like other typical guests, use standard ACPI mechanisms
> to signal sleep state entry (including reboot) to the host. The ACPI
> specification mandates WBINVD on any sleep state entry with the
> expectation that the platform is only responsible for maintaining the
> state of memory over sleep states, not preserving dirty data in any
> CPU caches. ACPI cache flushing requirements pre-date the advent of
> virtualization. Given guest sleep state entry does not affect any host
> power rails it is not required to flush caches. The host is
> responsible for maintaining cache state over its own bare metal sleep
> state transitions that power-off the cache. A TDX guest, unlike a
> typical guest, will machine check if the CPU cache is powered off."
>
> Andi, is that machine check behavior relative to power states
> mentioned in the docs?
I don't think there's anything about power states. There is a general
documented mechanism to integrity-check TD guest memory, but it is *not*
replay-resistant. So, if the guest dirties a cache line, and the cache
line is lost, it seems entirely plausible that the guest would get
silently corrupted.
I would argue that, if this happens, it's a host, TD module, or
architecture bug, and it's not the guest's fault.
--Andy