Re: x86/asm: __clear_user() micro-optimization (was: "Re: [GIT PULL] x86/asm changes for v4.18")

From: Alexey Dobriyan
Date: Tue Jun 05 2018 - 19:20:30 EST


On Tue, Jun 05, 2018 at 04:04:37PM -0700, Linus Torvalds wrote:
> On Tue, Jun 5, 2018 at 4:01 PM Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Tue, Jun 5, 2018 at 3:41 PM Alexey Dobriyan <adobriyan@xxxxxxxxx> wrote:
> > >
> > > On my potato performance increase is 33%, sheesh.
> > > And CPU starts doing 3 instructions per cycle vs 2.
> >
> > Whee. That's a shockingly big difference. On my CPU (i7-6700K) it
> > makes absolutely no difference whether the values are integers or in
> > registers.
>
> In fact, looking at Agner Fog's instruction lists, I don't see any CPU
> where it would make a difference, except for the P4 (where the
> immediate looks like it's a bad idea because it's an extra uop, but it
> might pack fine and not be noticeable).
>
> But maybe I'm missing something subtle. What CPU, out of morbid interest?

This is Broadwell Xeon E5-2620 v4.
Which is somewhat strange indeed because it should be modern enough.