Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature

From: Dave Hansen
Date: Mon Nov 04 2024 - 12:45:21 EST


On 11/4/24 09:22, Shah, Amit wrote:
>> I think you're wrong. We can't depend on ERAPS for this. Linux
>> doesn't flush the TLB on context switches when PCIDs are in play.
>> Thus, ERAPS won't flush the RSB and will leave bad state in there
>> and will leave the system vulnerable.
>>
>> Or what am I missing?
> I just received confirmation from our hardware engineers on this too:
>
> 1. the RSB is flushed when CR3 is updated
> 2. the RSB is flushed when INVPCID is issued (except type 0 - single
> address).
>
> I didn't mention 1. so far, which led to your question, right?

Not only did you not mention it, you said something _completely_
different. So, where the documentation for this thing? I dug through
the 57230 .zip file and I see the CPUID bit:

24 ERAPS. Read-only. Reset: 1. Indicates support for enhanced
return address predictor security.

but nothing telling us how it works.

> Does this now cover all the cases?

Nope, it's worse than I thought. Look at:

> SYM_FUNC_START(__switch_to_asm)
...
> FILL_RETURN_BUFFER %r12, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_CTXSW

which does the RSB fill at the same time it switches RSP.

So we feel the need to flush the RSB on *ALL* task switches. That
includes switches between threads in a process *AND* switches over to
kernel threads from user ones.

So, I'll flip this back around. Today, X86_FEATURE_RSB_CTXSW zaps the
RSB whenever RSP is updated to a new task stack. Please convince me
that ERAPS provides superior coverage or is unnecessary in all the
possible combinations switching between:

different thread, same mm
user=>kernel, same mm
kernel=>user, same mm
different mm (we already covered this)

Because several of those switches can happen without a CR3 write or INVPCID.