Re: [PATCH 2/2] tools/memory-model: Add write ordering by release-acquire and by locks

From: Daniel Lustig
Date: Thu Jul 05 2018 - 14:12:32 EST

On 7/5/2018 9:56 AM, Paul E. McKenney wrote:
> On Thu, Jul 05, 2018 at 05:22:26PM +0100, Will Deacon wrote:
>> On Thu, Jul 05, 2018 at 08:44:39AM -0700, Daniel Lustig wrote:
>>> On 7/5/2018 8:31 AM, Paul E. McKenney wrote:
>>>> On Thu, Jul 05, 2018 at 10:21:36AM -0400, Alan Stern wrote:
>>>>> At any rate, it looks like instead of strengthening the relation, I
>>>>> should write a patch that removes it entirely. I also will add new,
>>>>> stronger relations for use with locking, essentially making spin_lock
>>>>> and spin_unlock be RCsc.
>>>> Only in the presence of smp_mb__after_unlock_lock() or
>>>> smp_mb__after_spinlock(), correct? Or am I confused about RCsc?
>>>> Thanx, Paul
>>> In terms of what you're asking for really RCsc? To me,
>>> that would imply that even stores in the first critical section would
>>> need to be ordered before loads in the second critical section.
>>> Meaning that even x86 would need an mfence in either lock() or unlock()?
>> I think a LOCK operation always implies an atomic RmW, which will give
>> full ordering guarantees on x86. I know there have been interesting issues
>> involving I/O accesses in the past, but I think that's still out of scope
>> for the memory model.

Yes, you're right about atomic RMWs on x86, and I'm not worried about I/O
here either. But see below.

>> Peter will know.
> Agreed, x86 locked operations imply full fences, so x86 will order the
> accesses in consecutive critical sections with respect to an observer
> not holding the lock, even stores in earlier critical sections against
> loads in later critical sections. We have been discussing tightening
> LKMM to make an unlock-lock pair order everything except earlier stores
> vs. later loads. (Of course, if everyone holds the lock, they will see
> full ordering against both earlier and later critical sections.)
> Or are you pushing for something stronger?
> Thanx, Paul

No, I'm definitely not pushing for anything stronger. I'm still just
wondering if the name "RCsc" is right for what you described. For
example, Andrea just said this in a parallel email:

> "RCsc" as ordering everything except for W -> R, without the [extra]
> barriers

If it's "RCsc with exceptions", doesn't it make sense to find a
different name, rather than simply overloading the term "RCsc" with
a subtly different meaning, and hoping nobody gets confused?

I suppose on x86 and ARM you'd happen to get "true RCsc" anyway, just
due to the way things are currently mapped: LOCKed RMWs and "true RCsc"
instructions, respectively. But on Power and RISC-V, it would really
be more "RCsc with a W->R exception", right?

In fact, the more I think about it, this doesn't seem to be RCsc at all.
It seems closer to "RCpc plus extra PC ordering between critical
sections". No?

The synchronization accesses themselves aren't sequentially consistent
with respect to each other under the Power or RISC-V mappings, unless
there's a hwsync in there somewhere that I missed? Or a rule
preventing stw from forwarding to lwarx? Or some other higher-order
effect preventing it from being observed anyway?

So that's all I'm suggesting here. If you all buy that, maybe "RCpccs"
for "RCpc with processor consistent critical section ordering"?
I don't have a strong opinion on the name itself; I just want to find
a name that's less ambiguous or overloaded.