Re: [RFC PATCH] asm/generic: introduce if_nospec and nospec_barrier

From: Dan Williams
Date: Thu Jan 04 2018 - 19:24:22 EST


On Thu, Jan 4, 2018 at 3:06 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Thu, Jan 4, 2018 at 2:55 PM, Alan Cox <gnomes@xxxxxxxxxxxxxxxxxxx> wrote:
>>
>> How do you ensure that the CPU doesn't speculate j < _m ? ~0 : 0 pick the
>> wrong mask and then reference base[] ?
>
> .. yeah, that's exactly where we want to make sure that the compiler
> uses a select or 'setb'.
>
> That's what gcc does for me in testing:
>
> xorl %eax, %eax
> setbe %al
> negq %rax
>
> but yes, we'd need to guarantee it somehow.
>
> Presumably that is where we end up having some arch-specific stuff.
> Possibly there is some gcc builtin. I wanted to avoid actually writing
> architecture-specific asm.
>
>> Anding with a constant works because the constant doesn't get speculated
>> and nor does the and with a constant, but you've got a whole additional
>> conditional path in your macro.
>
> Absolutely. Think of it as an example, not "the solution".
>
> It's also possible that x86 'lfence' really is so fast that it doesn't
> make sense to try to do this. Agner Fog claims that it's single-cycle
> (well, except for P4, surprise, surprise), but I suspect that his
> timings are simply for 'lfence' in a loop or something. Which may not
> show the real cost of actually halting things until they are stable.
>
> Also, maybe that __fcheck_files() pattern where getting a NULL pointer
> happens to be the right thing for out-of-range is so unusual as to be
> useless, and most people end up having to have that limit check for
> other reasons anyway.

This potential barrier avoidance optimization technique is something
that could fit in the nospec_{ptr,load,array_load} interface that Mark
defined for ARM, and is constructed around a proposed compiler
builtin. Although, lfence is not a full serializing instruction, so
before we spend too much effort trying to kill it we should measure
how bad it is in practice in these paths.

At this point I'll go ahead with rewriting the osb() patches into
using Mark's nospec_* accessors plus the if_nospec macro. We can kill
the barrier in if_nospec once we are sure the compiler will always "Do
the Right Thing" with the array_access() approach, and can otherwise
make the array_access() approach the 'best-effort' default fallback.