Re: [PATCH RFC] percpu: add data dependency barrier in percpu accessors and operations

From: Paul E. McKenney
Date: Tue Jul 15 2014 - 13:45:32 EST


On Tue, Jul 15, 2014 at 10:32:13AM -0700, Linus Torvalds wrote:
> On Jul 15, 2014 9:12 AM, "Christoph Lameter" <cl@xxxxxxxxxx> wrote:
> >
> > I mentioned that there is a barrier because the process of handing over
> > the offset to the other includes synchronization. In the slab case this is
> > a semaphore that is use to protect the structure and the list of
> > kmem_cache structures. The control struct containing the offset must be
> > entered somehow into something that tracks it for the future and thus
> > there is synchronization by the subsytem.
>
> Maybe. It's not at all obvious, though. That control structure may be a
> lockless percpu thing, after all. We have front end caches to the
> allocators etc..
>
> So yes, if there is a lock+unlock, that can be a memory barrier - but even
> that is not necessarily true on all architectures.
>
> Is there even always one, though? And what about the other CPU? Does it
> have any paired barrier?
>
> > And now we still see the old data. The cacheline changes of the initial
> > processor are ignored?
>
> They aren't "ignored". But without the proper barriers, the zeroing writes
> may still be buffered on the other CPU, and may not even have had a cache
> line allocated to them!
>
> That's the kind of thing that makes a barrier possibly required.
>
> But yes, the barrier may be implied by other synchronization if that
> exists. A lock doesn't necessarily help, though. A write before a lock can
> migrate down into the locked region, and a write after the lock may
> similarly migrate into the locked region, and then two such writes may be
> seen out of order on another CPU that doesn't take the lock itself to
> serialize.
>
> See? It's those kinds of things that can cause really subtle memory
> ordering problems. We generally never see them on x86, since writes are
> always ordered against other writes, and reads are allays ordered wrt other
> reads (but not reads against writes) and all locks are full memory barriers.
>
> But on other architectures even a lock+unlock may not be a barrier, since
> things moving into the locked region is fine (the lock means that things
> had better not move *out* of s locked region).
>
> So depending on serialization may not always be all that obvious. Even if
> it does happen to all work on x86.
>
> And don't get me wrong. I despise weak memory models. I think they are a
> mistake. I absolutely detest the alpha model, and I think PowerPC and ARM
> are wrong too. It's just that we have to try to care about their broken
> crap )^:

Well at least all the required barriers are free on TSO systems like
x86 and mainframe, and the read-side barriers are free evne on ARM and
PowerPC (not so much on DEC Alpha, but you can't have everything). OK,
OK, there is a volatile cast and/or a barrier() directive that might
constrain the compiler a bit in some cases.

So this might well be crap, but at least it is (nearly) zero-cost crap
from the viewpoint of TSO machines like x86. ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/