Re: [PATCH 1/2] x86: cpu/bugs: add support for AMD ERAPS feature

From: Dave Hansen
Date: Tue Nov 05 2024 - 11:19:42 EST


On 11/5/24 02:39, Shah, Amit wrote:
> On Mon, 2024-11-04 at 09:45 -0800, Dave Hansen wrote:
> I'm expecting the APM update come out soon, but I have put together
>
> https://amitshah.net/2024/11/eraps-reduces-software-tax-for-hardware-bugs/
>
> based on information I have. I think it's mostly consistent with what
> I've said so far - with the exception of the mov-CR3 flush only
> confirmed yesterday.

That's better. But your original cover letter did say:

Feature documented in AMD PPR 57238.

which is technically true because the _bit_ is defined. But it's far,
far from being sufficiently documented for Linux to actually use it.

Could we please be more careful about these in the future?

>> So, I'll flip this back around.  Today, X86_FEATURE_RSB_CTXSW zaps
>> the
>> RSB whenever RSP is updated to a new task stack.  Please convince me
>> that ERAPS provides superior coverage or is unnecessary in all the
>> possible combinations switching between:
>>
>> different thread, same mm
>
> This case is the same userspace process with valid addresses in the RSB
> for that process. An invalid speculation isn't security sensitive,
> just a misprediction that won't be retired. So we are good here.

Does that match what the __switch_to_asm comment says, though?

> /*
> * When switching from a shallower to a deeper call stack
> * the RSB may either underflow or use entries populated
> * with userspace addresses. On CPUs where those concerns
> * exist, overwrite the RSB with entries which capture
> * speculative execution to prevent attack.
> */

It is also talking just about call depth, not about same-address-space
RSB entries being harmless. That's because this is also trying to avoid
having the kernel consume any user-placed RSB entries, regardless of
whether they're from the same mm or not.

>> user=>kernel, same mm
>> kernel=>user, same mm
>
> user-kernel is protected with SMEP. Also, we don't call
> FILL_RETURN_BUFFER for these switches?

Amit, I'm beginning to fear that you haven't gone and looked at the
relevant code here. Please go look at SYM_FUNC_START(__switch_to_asm)
in arch/x86/entry/entry_64.S. I believe this code is called for all
task switches, including switching from a user task to a kernel task. I
also believe that FILL_RETURN_BUFFER is used unconditionally for every
__switch_to_asm call (when X86_FEATURE_RSB_CTXSW is on of course).

Could we please start over on this patch?

Let's get the ERAPS+TLB-flush nonsense out of the kernel and get the
commit message right.

Then let's go from there.