Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")

From: Linus Torvalds
Date: Tue Jun 05 2018 - 19:04:55 EST


On Tue, Jun 5, 2018 at 4:01 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, Jun 5, 2018 at 3:41 PM Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:
> >
> > On my potato performance increase is 33%, sheesh.
> > And CPU starts doing 3 instructions per cycle vs 2.
>
> Whee. That's a shockingly big difference. On my CPU (i7-6700K) it
> makes absolutely no difference whether the values are integers or in
> registers.

In fact, looking at Agner Fog's instruction lists, I don't see any CPU
where it would make a difference, except for the P4 (where the
immediate looks like it's a bad idea because it's an extra uop, but it
might pack fine and not be noticeable).

But maybe I'm missing something subtle. What CPU, out of morbid interest?

Linus