Re: [PATCH v2 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels
From: Dan Williams
Date: Mon Feb 05 2018 - 16:33:29 EST
On Mon, Feb 5, 2018 at 3:58 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>
> * Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
>> + /*
>> + * Sanitize extra registers of values that a speculation attack
>> + * might want to exploit. In the CONFIG_FRAME_POINTER=y case,
>> + * the expectation is that %ebp will be clobbered before it
>> + * could be used.
>> + */
>> + .macro CLEAR_EXTRA_REGS_NOSPEC
>> + xorq %r15, %r15
>> + xorq %r14, %r14
>> + xorq %r13, %r13
>> + xorq %r12, %r12
>> + xorl %ebx, %ebx
>> +#ifndef CONFIG_FRAME_POINTER
>> + xorl %ebp, %ebp
>> +#endif
>
> BTW., is there any reason behind the order of the clearing of these registers?
> This ordering seems rather random:
>
> - The canonical register order is: RBX, RBP, R12, R13, R14, R15, which is also
> their push-order on the stack.
>
> - The CLEAR_EXTRA_REGS_NOSPEC order appears to be the reverse order (pop-order),
> but with RBX and RBP reversed.
>
> So since this is a 'push side' primitive I'd use the regular (push-) ordering
> instead:
>
> .macro CLEAR_EXTRA_REGS_NOSPEC
> xorl %ebx, %ebx
> xorl %ebp, %ebp
> xorq %r12, %r12
> xorq %r13, %r13
> xorq %r14, %r14
> xorq %r15, %r15
>
> It obviously doesn't matter to correctness - only to readability.
Sure, will do.
>
> There's also a (very) small micro-optimization argument in favor of the regular
> order: the earlier registers are more likely to be utilized by C functions, so the
> sooner we clear them, the less potential interaction these clearing instructions
> are going to have with any later use.
On a suggestion from Arjan it also appears worthwhile to interleave
'mov' with 'xor'. Perf stat says that this test gets 3.45 instructions
per cycle:
for (i = 0; i < INT_MAX/1024; i++)
asm(".rept 1024\n"
"xorl %%ebx, %%ebx\n"
"movq $0, %%r10\n"
"xorq %%r11, %%r11\n"
"movq $0, %%r12\n"
"xorq %%r13, %%r13\n"
"movq $0, %%r14\n"
"xorq %%r15, %%r15\n"
".endr"
: : : "r15", "r14", "r13", "r12",
"ebx", "r11", "r10");
...the 'rept' is there to try to minimize micro-op caching effects.
The straight xor version in comparisons gets 2.88 instructions per
cycle:
for (i = 0; i < INT_MAX/1024; i++)
asm(".rept 1024\n"
"xorl %%ebx, %%ebx\n"
"xorq %%r10, %%r10\n"
"xorq %%r11, %%r11\n"
"xorq %%r12, %%r12\n"
"xorq %%r13, %%r13\n"
"xorq %%r14, %%r14\n"
"xorq %%r15, %%r15\n"
".endr"
: : : "r15", "r14", "r13", "r12",
"ebx", "r11", "r10");