Re: [PATCH v6 1/1] x86: kvm: svm: set up ERAPS support for guests

From: Andrew Cooper
Date: Mon Nov 24 2025 - 11:43:18 EST


On 24/11/2025 4:15 pm, Shah, Amit wrote:
> On Thu, 2025-11-20 at 12:11 -0800, Sean Christopherson wrote:
>>> 2. Hosts that disable NPT: the ERAPS feature flushes the RSB
>>> entries on
>>>    several conditions, including CR3 updates.  Emulating hardware
>>>    behaviour on RSB flushes is not worth the effort for NPT=off
>>> case,
>>>    nor is it worthwhile to enumerate and emulate every trigger the
>>>    hardware uses to flush RSB entries.  Instead of identifying and
>>>    replicating RSB flushes that hardware would have performed had
>>> NPT
>>>    been ON, do not let NPT=off VMs use the ERAPS features.
>> The emulation requirements are not limited to shadow paging.  From
>> the APM:
>>
>>   The ERAPS feature eliminates the need to execute CALL instructions
>> to clear
>>   the return address predictor in most cases. On processors that
>> support ERAPS,
>>   return addresses from CALL instructions executed in host mode are
>> not used in
>>   guest mode, and vice versa. Additionally, the return address
>> predictor is
>>   cleared in all cases when the TLB is implicitly invalidated (see
>> Section 5.5.3 “TLB
>>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>   Management,” on page 159) and in the following cases:
>>
>>   • MOV CR3 instruction
>>   • INVPCID other than single address invalidation (operation type 0)
>>
>> Yes, KVM only intercepts MOV CR3 and INVPCID when NPT is disabled (or
>> INVPCID is
>> unsupported per guest CPUID), but that is an implementation detail,
>> the instructions
>> are still reachable via emulator, and KVM needs to emulate implicit
>> TLB flush
>> behavior.
>>
>> So punting on emulating RAP clearing because it's too hard is not an
>> option.  And
>> AFAICT, it's not even that hard.
> I didn't mean on punting it in the "it's too hard" sense, but in the
> sense that we don't know all the details of when hardware decides to do
> a flush; and even if triggers are mentioned in this APM today, future
> changes to microcode or APM docs might reveal more triggers that we
> need to emulate and account for. There's no way to track such changes,
> so my thinking is that we should be conservative and not assume
> anything.

But this *is* the problem.  The APM says that OSes can depend on this
property for safety, and does not provide enough information for
Hypervisors to make it safe.

ERAPS is a bad spec.  It should not have gotten out of the door.

A better spec would say "clears the RAP on any MOV to CR3" and nothing else.

The fact that it might happen microarchitecturally in other cases
doesn't matter; what matters is what OSes can architecturally depend on,
and right now that that explicitly includes "unspecified cases in NDA
documents".

~Andrew