Re: RFC: The hypervisor's responsibility to stuff the RSB

From: Andrew Cooper
Date: Fri Jul 22 2022 - 17:59:19 EST


On 22/07/2022 22:35, Jim Mattson wrote:
> Now that Retbleed has drawn everyone's attention back to Skylake's
> RSBA behavior, I've been hearing murmurings about the hypervisor's
> responsibility to stuff the RSB on VM-entry when running on RSBA
> parts.
>
> Referring back to Intel's paper, "Retpoline: A Branch Target Injection
> Mitigation," it does say:
>
>> There are also a number of events that happen asynchronously from normal program execution that can result in an empty RSB. Software may use “RSB stuffing” sequences whenever these asynchronous events occur:
>>
>> 1. Interrupts/NMIs/traps/aborts/exceptions which increase call depth.
>> 2. System Management Interrupts (SMI) (see BIOS/Firmware Interactions).
>> 3. Host VMEXIT/VMRESUME/VMENTER.
>> 4. Microcode update load (WRMSR 0x79) on another logical processor of the same core.
>>
>> Software may avoid RSB underflow by inserting an “RSB stuffing” sequence following all of the above conditions.
> KVM *does* stuff the RSB on VM-exit, to protect the host kernel.
> However, it fails to stuff the RSB on VM-entry. Stuffing the RSB on
> VM-entry is necessary to protect the guest if KVM has made any unsafe
> changes to the RSB, such as reducing its depth. Though Intel doesn't
> spell it out, the responsibility of the hypervisor on VM-entry is much
> the same as the responsibility of the SMI handler on RSM.
>
> For reference, here's the "BIOS/Firmware Interactions" section of the
> aforementioned paper, referenced above:
>
>> System Management Interrupt (SMI) handlers can leave the RSB in a state that OS code does not expect. In order to avoid RSB underflow on return from SMI, an SMI handler may implement RSB stuffing (for parts identified in Table 5) before returning from System Management Mode (SMM). Updated SMI handlers are provided via system BIOS updates.
> I don't really want to do this, but I don't want to be negligent, either.
>
> Thoughts?

The suggestion is unrealistic.

Even if the SMM handler does stuff the RSB, it's still in a state the OS
code does not expect.  (And if your CPU lacks SMEP, you've totally lost.)

Retpoline *is not safe* on Skylake-era CPUs, and we knew this before the
Spectre/Meltdown embargo broke in Jan '18.  Having SMM/VMM stuffing on
exit doesn't fix the problem; it just papers over two of the many holes.

Xen also does not stuff on the exit-to-guest path, and I don't consider
changing this to be a useful improvement in security.

~Andrew