Re: Control Dependencies vs C Compilers

From: Peter Zijlstra
Date: Wed Oct 07 2020 - 07:51:14 EST


On Wed, Oct 07, 2020 at 12:20:41PM +0200, Florian Weimer wrote:
> * Peter Zijlstra:

> > A branch that cannot be optimized away and prohibits lifting stores
> > over. One possible suggestion would be allowing the volatile keyword as
> > a qualifier to if.
> >
> > x = *foo;
> > volatile if (x > 42)
> > *bar = 1;
> >
> > This would tell the compiler that the condition is special in that it
> > must emit a conditional branch instruction and that it must not lift
> > stores (or sequence points) over it.
>
> But it's not the if statement, but the expression in it, right?

No, it *IS* the if statement, the magic is a conditional branch
instruction and the fact that CPUs are not allowed to speculate stores
(which would lead to instant OOTA).

> So this would not be a valid transformation:
>
> x = *foo;
> bool flag = x > 42;
> volatile if (flag)
> *bar = 1;

It would be valid, it still ensures the load of *foo happens before the
store of *bar.

> And if we had this:
>
> unsigned x = *foo;
> volatile if (x >= 17 && x < 42)
> *bar = 1;
>
> Would it be valid to transform this into (assuming that I got the
> arithmetic correct):
>
> unsigned x = *foo;
> volatile if ((x - 17) < 25)
> *bar = 1;
>
> Or would this destroy the magic because arithmetic happens on the value
> before the comparison?

Nope, that'd still be good. The critical part is that the resolution of
the conditional branch depend on the load. All these transformations
preserve that.

So we use the data dependency between the load and the branch
instruction coupled with the inability to speculate stores, to generate
a LOAD to STORE ordering.

> >> But not using READ_ONCE and WRITE_ONCE?
> >
> > I'm OK with READ_ONCE(), but the WRITE_ONCE() doesn't help much, if
> > anything. The compiler is always allowed to lift stores, regardless of
> > the qualifiers used.
>
> I would hope that with some level of formalization, it can be shown that
> no additional synchronization is necessary beyond the load/conditional
> sequence.

Agreed. Those are the critical part, the tricky bit is ensuring the
compiler doesn't lift stuff over the condition.

> >> I think in GCC, they are called __atomic_load_n(foo, __ATOMIC_RELAXED)
> >> and __atomic_store_n(foo, __ATOMIC_RELAXED). GCC can't optimize relaxed
> >> MO loads and stores because the C memory model is defective and does not
> >> actually guarantee the absence of out-of-thin-air values (a property it
> >> was supposed to have).
> >
> > AFAIK people want to get that flaw in the C memory model fixed (which to
> > me seemd like a very good idea).
>
> It's been a long time since people realized that this problem exists,
> with several standard releases since then.

I've been given to believe it is a hard problem. Personally I hold the
opinion that prohibiting store speculation (of all kinds) is both
necesary and sufficient to avoid OOTA. But I have 0 proof for that.