Re: [RFC v2 16/32] x86/tdx: Handle MWAIT, MONITOR and WBINVD

From: Kuppuswamy, Sathyanarayanan
Date: Mon May 10 2021 - 22:44:39 EST




On 5/10/21 7:17 PM, Andi Kleen wrote:
To prevent TD guest from using MWAIT/MONITOR instructions,
support for these instructions are already disabled by TDX
module (SEAM). So CPUID flags for these instructions should
be in disabled state.
Why does this not result in a #UD if the instruction is disabled by
SEAM?

It's just the TDX module (SEAM is the execution mode used by the TDX module)

If it is disabled by the TDX Module, we should never execute it. But for some
reason, if we still come across this instruction (buggy TDX module?), we add
appropriate warning in #VE handler.



How is it possible to execute a disabled instruction (one
precluded by CPUID) to the point where it triggers #VE instead of #UD?

That's how the TDX module works. It never injects anything else other than #VE. You can still get other exceptions of course, but they won't come from the TDX module.

After the above mentioned preventive measures, if TD guests still
execute these instructions, add appropriate warning messages in #VE
handler. For WBIND instruction, since it's related to memory writeback
and cache flushes, it's mainly used in context of IO devices. Since
TDX 1.0 does not support non-virtual I/O devices, skipping it should
not cause any fatal issues.
WBINVD is in a different class than MWAIT/MONITOR since it is not
identified by CPUID, it can't possibly have the same #UD behaviour.
It's not clear why WBINVD is included in the same patch as
MWAIT/MONITOR?

Because these are all instructions we never expect to execute, so nothing special is needed for them. That's a unique class that logically fits together.

Yes, for all these three instruction we don't need any special
handling code. So they are grouped together.




I disagree with the assertion that WBINVD is mainly used in the
context of I/O devices, it's also used for ACPI power management
paths.

You mean S3? That's of course also not supported inside TDX.


  WBINVD dependent functionality should be dynamically disabled
rather than warned about.

Does a TDX guest support out-of-tree modules?  The kernel is already
tainted when out-of-tree modules are loaded. In other words in-tree
modules preclude forbidden instructions because they can just be
audited, and out-of-tree modules are ok to trigger abrupt failure if
they attempt to use forbidden instructions.

We already did a lot of bi^wdiscussion on this on the last review.

Originally we had a different handling, this was the result of previous feedback.

It doesn't really matter because it should never happen.



But to let users know about its usage, use
WARN() to report about it.. For MWAIT/MONITOR instruction, since its
unsupported use WARN() to report unsupported usage.
I'm not sure how useful warning is outside of a kernel developer's
debug environment. The kernel should know what instructions are
disabled and which are available. WBINVD in particular has potential
data integrity implications. Code that might lead to a WBINVD usage
should be disabled, not run all the way up to where WBINVD is
attempted and then trigger an after-the-fact WARN_ONCE().

We don't expect the warning to ever happen. Yes all of this will be disabled. Nearly all are in code paths that cannot happen inside TDX anyways due to missing PCI-IDs or different cpuids, and S3 is explicitly disabled and would be impossible anyways due to lack of BIOS support.

We have added WARN to let user know about its usage and fix it. By default we should
never hit this path.






The WBINVD change deserves to be split off from MWAIT/MONITOR, and
more thought needs to be put into where these spurious instruction
usages are arising.

I disagree. We already spent a lot of cycles on this. WBINVD makes never sense in current TDX and all the code will be disabled.



-Andi


--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer