Re: [patch 00/38] x86/retbleed: Call depth tracking mitigation

From: Thomas Gleixner
Date: Wed Jul 20 2022 - 05:00:51 EST


On Tue, Jul 19 2022 at 01:51, Peter Zijlstra wrote:
> On Mon, Jul 18, 2022 at 03:48:04PM -0700, Sami Tolvanen wrote:
>> On Mon, Jul 18, 2022 at 2:18 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>> > Ofc, we can still put the whole:
>> >
>> > sarq $5, PER_CPU_VAR(__x86_call_depth);
>> > jmp \func_direct
>> >
>> > thing in front of that.
>>
>> Sure, that would work.
>
> So if we assume \func starts with ENDBR, and further assume we've fixed
> up every direct jmp/call to land at +4, we can overwrite the ENDBR with
> part of the SARQ, that leaves us 6 more byte, placing the immediate at
> -10 if I'm not mis-counting.
>
> Now, the call sites are:
>
> 41 81 7b fa 78 56 34 12 cmpl $0x12345678, -6(%r11)
> 74 02 je 1f
> 0f 0b ud2
> e8 00 00 00 00 1: call __x86_indirect_thunk_r11
>
> That means the offset of +10 lands in the middle of the CALL
> instruction, and since we only have 16 thunks there is a limited number
> of byte patterns available there.
>
> This really isn't as nice as the -6 but might just work well enough,
> hmm?

So I added a 32byte padding and put the thunk at the start:

sarq $5, PER_CPU_VAR(__x86_call_depth);
jmp \func_direct

For sockperf that costs about 1% performance vs. the 16 byte
variant. For mitigations=off it's a ~0.5% drop.

That's on a SKL. Did not check on other systems yet.

Thanks,

tglx