Re: [PATCH v3 01/13] x86/retpoline: Add initial retpoline support

From: David Woodhouse
Date: Fri Jan 05 2018 - 05:55:59 EST


On Fri, 2018-01-05 at 02:28 -0800, Paul Turner wrote:
> On Thu, Jan 04, 2018 at 07:27:58PM +0000, David Woodhouse wrote:
> > On Thu, 2018-01-04 at 10:36 -0800, Alexei Starovoitov wrote:
> > >Â
> > > Pretty much.
> > > Paul's writeup: https://support.google.com/faqs/answer/7625886
> > > tldr: jmp *%r11 gets converted to:
> > > call set_up_target;
> > > capture_spec:
> > > Â pause;
> > > Â jmp capture_spec;
> > > set_up_target:
> > > Â mov %r11, (%rsp);
> > > Â ret;
> > > where capture_spec part will be looping speculatively.
>
> > That is almost identical to what's in my latest patch set, except that
> > the capture_spec loop has 'lfence' instead of 'pause'.
>
> When choosing this sequence I benchmarked several alternatives here, including
> (nothing, nops, fences, and other serializing instructions such as cpuid).
>
> The "pause; jmp" sequence proved minutely faster than "lfence;jmp" which is why
> it was chosen.
>
> Â "pause; jmp" 33.231 cycles/call 9.517 ns/call
> Â "lfence; jmp" 33.354 cycles/call 9.552 ns/call
>
> (Timings are for a complete retpolined indirect branch.)

Yeah, I studiously ignored you here and went with only what Intel had
*assured* me was correct and put into the GCC patches, rather than
chasing those 35 picoseconds ;)

The GCC patch set already had about four different variants over time,
with associated "oh shit, that one doesn't actually work; try this".
What we have in my patch set is precisely what GCC emits at the moment.

I'm all for optimising it further, but maybe not this week.

Other than that, is there any other development from your side that I
haven't captured in the latest (v4) series?
http://git.infradead.org/users/dwmw2/linux-retpoline.git/

Attachment: smime.p7s
Description: S/MIME cryptographic signature