Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

From: Alexei Starovoitov
Date: Sat Jan 06 2018 - 13:39:49 EST


On Sat, Jan 06, 2018 at 10:29:49AM -0800, Dan Williams wrote:
> On Sat, Jan 6, 2018 at 10:13 AM, Alexei Starovoitov
> <alexei.starovoitov@xxxxxxxxx> wrote:
> > On Sat, Jan 06, 2018 at 12:32:42PM +0000, Alan Cox wrote:
> >> On Fri, 5 Jan 2018 18:52:07 -0800
> >> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >>
> >> > On Fri, Jan 5, 2018 at 5:10 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> >> > > From: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> >> > >
> >> > > When access_ok fails we should always stop speculating.
> >> > > Add the required barriers to the x86 access_ok macro.
> >> >
> >> > Honestly, this seems completely bogus.
> >>
> >> Also for x86-64 if we are trusting that an AND with a constant won't get
> >> speculated into something else surely we can just and the address with ~(1
> >> << 63) before copying from/to user space ? The user will then just
> >> speculatively steal their own memory.
> >
> > +1
> >
> > Any type of straight line code can address variant 1.
> > Like changing:
> > array[index]
> > into
> > array[index & mask]
> > works even when 'mask' is a variable.
> > To proceed with speculative load from array cpu has to speculatively
> > load 'mask' from memory and speculatively do '&' alu.
> > If attacker cannot influence 'mask' the speculative value of it
> > will bound 'index & mask' value to be within array limits.
> >
> > I think "lets sprinkle lfence everywhere" approach is going to
> > cause serious performance degradation. Yet people pushing for lfence
> > didn't present any numbers.
> > Last time lfence was removed from the networking drivers via dma_rmb()
> > packet-per-second metric jumped 10-30%. lfence forces all outstanding loads
> > to complete. If any prior load is waiting on L3 or memory,
> > lfence will cause 100+ ns stall and overall kernel performance will tank.
>
> You are conflating dma_rmb() with the limited cases where
> nospec_array_ptr() is used. I need help determining what the
> performance impact of those limited places are.

really? fdtable, access_ok, net/ipv[46] is not critical path?

> > If kernel adopts this "lfence everywhere" approach it will be
> > the end of the kernel as we know it. All high performance operations
> > will move into user space. Networking and IO will be first.
> > Since it will takes years to design new cpus and even longer
> > to upgrade all servers the industry will have no choice,
> > but to move as much logic as possible from the kernel.
> >
> > kpti already made crossing user/kernel boundary slower, but
> > kernel itself is still fast. If kernel will have lfence everywhere
> > the kernel itself will be slow.
> >
> > In that sense retpolining the kernel is not as horrible as it sounds,
> > since both user space and kernel has to be retpolined.
>
> retpoline is variant-2, this patch series is about variant-1.

that's exactly the point. Don't slow down the kernel with lfences
to solve variant 1. retpoline for 2 is ok from long term kernel
viability perspective.