Re: Control Dependencies vs C Compilers
From: Florian Weimer
Date: Wed Oct 07 2020 - 06:21:17 EST
* Peter Zijlstra:
> On Tue, Oct 06, 2020 at 11:20:01PM +0200, Florian Weimer wrote:
>> * Peter Zijlstra:
>>
>> > Our Documentation/memory-barriers.txt has a Control Dependencies section
>> > (which I shall not replicate here for brevity) which lists a number of
>> > caveats. But in general the work-around we use is:
>> >
>> > x = READ_ONCE(*foo);
>> > if (x > 42)
>> > WRITE_ONCE(*bar, 1);
>> >
>> > Where READ/WRITE_ONCE() cast the variable volatile. The volatile
>> > qualifier dissuades the compiler from assuming it knows things and we
>> > then hope it will indeed emit the branch like we'd expect.
>> >
>> >
>> > Now, hoping the compiler generates correct code is clearly not ideal and
>> > very dangerous indeed. Which is why my question to the compiler folks
>> > assembled here is:
>> >
>> > Can we get a C language extention for this?
>>
>> For what exactly?
>
> A branch that cannot be optimized away and prohibits lifting stores
> over. One possible suggestion would be allowing the volatile keyword as
> a qualifier to if.
>
> x = *foo;
> volatile if (x > 42)
> *bar = 1;
>
> This would tell the compiler that the condition is special in that it
> must emit a conditional branch instruction and that it must not lift
> stores (or sequence points) over it.
But it's not the if statement, but the expression in it, right? So this
would not be a valid transformation:
x = *foo;
bool flag = x > 42;
volatile if (flag)
*bar = 1;
And if we had this:
unsigned x = *foo;
volatile if (x >= 17 && x < 42)
*bar = 1;
Would it be valid to transform this into (assuming that I got the
arithmetic correct):
unsigned x = *foo;
volatile if ((x - 17) < 25)
*bar = 1;
Or would this destroy the magic because arithmetic happens on the value
before the comparison?
>> But not using READ_ONCE and WRITE_ONCE?
>
> I'm OK with READ_ONCE(), but the WRITE_ONCE() doesn't help much, if
> anything. The compiler is always allowed to lift stores, regardless of
> the qualifiers used.
I would hope that with some level of formalization, it can be shown that
no additional synchronization is necessary beyond the load/conditional
sequence.
>> I think in GCC, they are called __atomic_load_n(foo, __ATOMIC_RELAXED)
>> and __atomic_store_n(foo, __ATOMIC_RELAXED). GCC can't optimize relaxed
>> MO loads and stores because the C memory model is defective and does not
>> actually guarantee the absence of out-of-thin-air values (a property it
>> was supposed to have).
>
> AFAIK people want to get that flaw in the C memory model fixed (which to
> me seemd like a very good idea).
It's been a long time since people realized that this problem exists,
with several standard releases since then.
Thanks,
Florian
--
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill