Re: Control Dependencies vs C Compilers

From: Peter Zijlstra
Date: Tue Oct 06 2020 - 09:31:34 EST

Next message: Sergey Senozhatsky: "Re: [PATCH] printk: handle blank console arguments passed in."
Previous message: Jens Axboe: "Re: [PATCH V7 0/2] percpu_ref & block: reduce memory footprint of percpu_ref in fast path"
In reply to: Willy Tarreau: "Re: Control Dependencies vs C Compilers"
Next in thread: stern@xxxxxxxxxxxxxxxxxxx: "Re: Control Dependencies vs C Compilers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Oct 06, 2020 at 12:37:06PM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 06 October 2020 12:47
> > Hi,
> >
> > Let's give this linux-toolchains thing a test-run...
> >
> > As some of you might know, there's a bit of a discrepancy between what
> > compiler and kernel people consider 'valid' use of the compiler :-)
> >
> > One area where this shows up is in implicit (memory) ordering provided
> > by the hardware, which we kernel people would like to use to avoid
> > explicit fences (expensive) but which the compiler is unaware of and
> > could ruin (bad).
> ...
> >
> > In short, the control dependency relies on the hardware never
> > speculating stores (instant OOTA) to provide a LOAD->STORE ordering.
> > That is, a LOAD must be completed to resolve a conditional branch, the
> > STORE is after the branch and cannot be made visible until the branch is
> > determined (which implies the load is complete).
> >
> > However, our 'dear' C language has no clue of any of this.
> >
> > So given code like:
> >
> > x = *foo;
> > if (x > 42)
> > *bar = 1;
> >
> > Which, if literally translated into assembly, would provide a
> > LOAD->STORE order between foo and bar, could, in the hands of an
> > evil^Woptimizing compiler, become:
> >
> > x = *foo;
> > *bar = 1;
> >
> > because it knows, through value tracking, that the condition must be
> > true.
> >
> > Our Documentation/memory-barriers.txt has a Control Dependencies section
> > (which I shall not replicate here for brevity) which lists a number of
> > caveats. But in general the work-around we use is:
> >
> > x = READ_ONCE(*foo);
> > if (x > 42)
> > WRITE_ONCE(*bar, 1);
>
> An alternative is to 'persuade' the compiler that
> any 'tracked' value for a local variable is invalid.
> Rather like the way that barrier() 'invalidates' memory.
> So you generate:
>
> x = *foo
> asm ("" : "+r" (x));
> if (x > 42)
> *bar = 1;
>
> Since the "+r" constraint indicates that the value of 'x'
> might have changed it can't optimise based on any
> presumed old value.
> (Unless it looks inside the asm opcodes...)

The compiler can still try and lift the store out of the block, possibly
by inventing more stores.

Please go read memory-barriers.txt for a bunch of other examples.

This thread is not to collect work-arounds that might convince a
compiler to emit the desired code as a side effect, but to get the
compiler people involved and get control-dependencies recognised such
that correct code gen is guaranteed.

Only if we get the compiler people on board and have them provide means
are we guaranteed safe from the optimizer. Otherwise we'll just keep
playing whack-a-mole with fancy new optimization techniques. And given
how horridly painful it is to debug memory ordering problems, I feel it
is best to make sure we're not going to have to more than necessary.

Next message: Sergey Senozhatsky: "Re: [PATCH] printk: handle blank console arguments passed in."
Previous message: Jens Axboe: "Re: [PATCH V7 0/2] percpu_ref & block: reduce memory footprint of percpu_ref in fast path"
In reply to: Willy Tarreau: "Re: Control Dependencies vs C Compilers"
Next in thread: stern@xxxxxxxxxxxxxxxxxxx: "Re: Control Dependencies vs C Compilers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]