Re: [RFC] LKMM: Add volatile_if()
From: Linus Torvalds
Date: Sun Jun 06 2021 - 14:34:57 EST
On Sun, Jun 6, 2021 at 6:03 AM Segher Boessenkool
<segher@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Sat, Jun 05, 2021 at 08:41:00PM -0700, Linus Torvalds wrote:
> >
> > I think it's something of a bug when it comes to "asm volatile", but
> > the documentation isn't exactly super-specific.
>
> Why would that be? "asm volatile" does not prevent optimisation.
Sure it does.
That's the whole and only *POINT* of the "volatile".
It's the same as a vol;atile memory access. That very much prevents
certain optimizations. You can't just join two volatile reads or
writes, because they have side effects.
And the exact same thing is true of inline asm. Even when they are
*identical*, inline asms have side effects that gcc simply doesn't
understand.
And yes, those side effects can - and do - include "you can't just merge these".
> It says this code has some unspecified side effect, and that is all!
And that should be sufficient. But gcc then violates it, because gcc
doesn't understand the side effects.
Now, the side effects may be *subtle*, but they are very very real.
Just placement of code wrt a branch will actually affect memory
ordering, as that one example was.
> > Something like this *does* seem to work:
> >
> > #define ____barrier(id) __asm__ __volatile__("#" #id: : :"memory")
> > #define __barrier(id) ____barrier(id)
> > #define barrier() __barrier(__COUNTER__)
> >
> > which is "interesting" or "disgusting" depending on how you happen to feel.
>
> __COUNTER__ is a preprocessor thing, much more like what you want here:
> this does its work *before* everything the compiler does, while %= does
> its thing *after* :-)
>
> (Not that I actually understand what you are trying to do with this).
See my previous email for why two barriers in two different code
sequences cannot just be joined into one and moved into the common
parent. It actually is semantically meaningful *where* they are, and
they are distinct barriers.
The case we happen to care about is memory ordering issues. The
example quoted may sound pointless and insane, and I actually don't
believe we have real code that triggers the issue, because whenever we
have a conditional barrier, the two sides of the conditional are
generally so different that gcc would never merge any of it anyway.
So the issue is mostly theoretical, but we do have code that is fairly
critical, and that depends on memory ordering, and on some weakly
ordered machines (which is where all these problems would happen),
actual explicit memory barriers are also <i>much</i> too expensive.
End result: we have code that depends on the fact that a read-to-write
ordering exists if there is a data dependency or a control dependency
between the two. No actual expensive CPU instruction to specify the
ordering, because the ordering is implicit in the code flow itself.
But that's what we need a compiler barrier for in the first place -
the compiler certainly doesn't understand about this very subtle
memory ordering issue, and we want to make sure that the code sequence
*remains* that "if A then write B".
Linus