Re: [PATCH v2 0/4] atomic: Fixes to smp_mb__{before,after}_atomic() and mips.

From: Peter Zijlstra
Date: Thu Jun 13 2019 - 14:05:30 EST


On Thu, Jun 13, 2019 at 12:58:11PM -0400, Alan Stern wrote:
> On Thu, 13 Jun 2019, David Howells wrote:
>
> > Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > > Basically we fail for:
> > >
> > > *x = 1;
> > > atomic_inc(u);
> > > smp_mb__after_atomic();
> > > r0 = *y;
> > >
> > > Because, while the atomic_inc() implies memory order, it
> > > (surprisingly) does not provide a compiler barrier. This then allows
> > > the compiler to re-order like so:
> >
> > To quote memory-barriers.txt:
> >
> > (*) smp_mb__before_atomic();
> > (*) smp_mb__after_atomic();
> >
> > These are for use with atomic (such as add, subtract, increment and
> > decrement) functions that don't return a value, especially when used for
> > reference counting. These functions do not imply memory barriers.
> >
> > so it's entirely to be expected?
>
> The text is perhaps ambiguous. It means that the atomic functions
> which don't return values -- like atomic_inc() -- do not imply memory
> barriers. It doesn't mean that smp_mb__before_atomic() and
> smp_mb__after_atomic() do not imply memory barriers.
>
> The behavior Peter described is not to be expected. The expectation is
> that the smp_mb__after_atomic() in the example should force the "*x =
> 1" store to execute before the "r0 = *y" load. But on current x86 it
> doesn't force this, for the reason explained in the description.

Indeed, thanks Alan.

The other other approach would be to upgrade smp_mb__{before,after}_mb()
to actual full memory barriers on x86, but that seems quite rediculous
since atomic_inc() already does all the expensive bits and is only
missing the compiler barrier.

That would result in code like:

mov $1, x
lock inc u
lock addl $0, -4(%rsp) # aka smp_mb()
mov y, %r

which is really quite silly.

And as noted in the Changelog, about half the non-value returning
atomics already implied the compiler barrier anyway.