Re: [PATCH] arch: Introduce read_acquire()

From: Alexander Duyck
Date: Wed Nov 12 2014 - 14:26:11 EST




On 11/12/2014 07:37 AM, Peter Zijlstra wrote:
On Wed, Nov 12, 2014 at 07:23:22AM -0800, Alexander Duyck wrote:

On 11/12/2014 02:15 AM, Peter Zijlstra wrote:
On Tue, Nov 11, 2014 at 01:12:32PM -0800, Alexander Duyck wrote:
Minor nit on naming, but load_acquire would match what we do with barriers,
where you simply drop the smp_ prefix if you want the thing to work on UP
systems too.
The problem is this is slightly different, load_acquire in my mind would use
a mb() call, I only use a rmb(). That is why I chose read_acquire as the
name.
acquire is not about rmb vs mb, do read up on
Documentation/memory-barriers.txt. Its a distinctly different semantic.
Some archs simply lack the means of implementing this semantics and have
to revert to mb (stronger is always allowed).

Using the read vs load to wreck the acquire semantics is just insane.

Actually I have been reading up on it as I wasn't familiar with C11.

C11 is _different_ although somewhat related.

Honestly I find this quite confusing. If you have some sort of other documentation you can point me at it would be useful in terms of what you are expecting for behaviour and names.

Most
of what I was doing was actually based on the documentation in barriers.txt
which was referring to memory operations not loads/stores when referring to
the acquire/release so I assumed the full memory barrier was required. I
wasn't aware that smp_load_acquire was only supposed to be ordering loads,
or that smp_ store_release only applied to stores.

It does not.. an ACQUIRE is a semi-permeable barrier that doesn't allow
LOADs nor STOREs that are issued _after_ it to appear to happen _before_.
The RELEASE is the opposite number, it ensures LOADs and STOREs that are
issued _before_ cannot happen _after_.

This typically matches locking, where a lock (mutex_lock, spin_lock
etc..) have ACQUIRE semantics and the unlock RELEASE. Such that:

spin_lock();
a = 1;
b = x;
spin_unlock();

guarantees all LOADs (x) and STORESs (a,b) happen _inside_ the lock
region. What they do not guarantee is:


y = 1;
spin_lock()
a = 1;
b = x;
spin_unlock()
z = 4;

An order between y and z, both are allowed _into_ the region and can
cross there like:

spin_lock();
...
z = 4;
y = 1;
...
spin_unlock();


The only 'open' issue at the moment is if RELEASE+ACQUIRE := MB.
Currently we say this is not so, but Will (and me) would very much like
this to be so -- PPC64 being the only arch that actually makes this
distinction.

In the grand scheme of things I suppose it doesn't matter too much. I actually found a documentation that kind of explains subtle nuances of things a bit. Specifically Acquire represents the first row in the table below, and Release represents the second column:

Acquire -> LoadLoad LoadStore
StoreLoad StoreStore
^
|
Release

The LoadStore bit was in question in a few different discussions I read, however as it turns out on x86, sparc, s390, PowerPC, arm64, and ia64 they would give you that as a free benefit anyway. I think that covers a wide enough gamut that I don't really care if I take a performance hit on the other architectures for implementing a full mb() versus a rmb().

Thanks,

Alex



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/