Re: [PATCH v2 1/3] x86/entry: Clear extra registers beyond syscall arguments for 64bit kernels

From: Linus Torvalds
Date: Mon Feb 05 2018 - 16:59:11 EST


On Mon, Feb 5, 2018 at 1:33 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
> On a suggestion from Arjan it also appears worthwhile to interleave
> 'mov' with 'xor'. Perf stat says that this test gets 3.45 instructions
> per cycle:

Ugh.

A "xor %reg/reg" is two bytes (three for the high regs due to REX
prefix). A "mov $0" is 7 bytes because unlike most of the ALU ops,
"mov" doesn't have a 8-bit expanding immediate.

So replacing those xors with movq's will add at least four bytes per
replacement. So you may well end up adding an L1 cache miss.

At which point "3.45 ipc" vs "2.88 ipc" is pretty much a non-issue.

I suspect that a bigger win would be if you try to interleave those
"xor" instructions with the "pushq" instructions in the entry code.
Because those push instructions tend to be limited by the LSU store
bandwidth, so you can probably put in xor instructions almost for free
in there.

Linus