Re: [PATCH 2/2] tools/memory-model: Add write ordering by release-acquire and by locks

From: Daniel Lustig
Date: Thu Jul 05 2018 - 11:17:06 EST

On 7/5/2018 7:44 AM, Will Deacon wrote:
> Andrea,
> On Thu, Jul 05, 2018 at 04:00:29PM +0200, Andrea Parri wrote:
>> On Wed, Jul 04, 2018 at 01:11:04PM +0100, Will Deacon wrote:
>>> On Tue, Jul 03, 2018 at 01:28:17PM -0400, Alan Stern wrote:
>>>> There's also read-write ordering, in the form of the LB pattern:
>>>> P0(int *x, int *y, int *z)
>>>> {
>>>> r0 = READ_ONCE(*x);
>>>> smp_store_release(z, 1);
>>>> r1 = smp_load_acquire(z);
>>>> WRITE_ONCE(*y, 1);
>>>> }
>>>> P1(int *x, int *y)
>>>> {
>>>> r2 = READ_ONCE(*y);
>>>> smp_mp();
>>>> WRITE_ONCE(*x, 1);
>>>> }
>>>> exists (0:r0=1 /\ 1:r2=1)
>>> The access types are irrelevant to the acquire/release primitives, so yes
>>> that's also allowed.
>>>> Would this be allowed if smp_load_acquire() was implemented with LDAPR?
>>>> If the answer is yes then we will have to remove the rfi-rel-acq and
>>>> rel-rf-acq-po relations from the memory model entirely.
>>> I don't understand what you mean by "rfi-rel-acq-po", and I assume you mean
>>> rel-rfi-acq-po for the other? Sounds like I'm confused here.
>> [Your reply about 1/2 suggests that you've figured this out now, IAC ...]
> Yeah, the naming threw me because it's not the same order as the composition
> in the actual definition (why not?). I typoed an extra 'po' suffix. Well
> done for spotting it.
>> "rfi-rel-acq" (as Alan wrote, and as I wrote before my question above...)
>> is defined and currently used in (and, FWIW, it has been
>> so since when we upstreamed LKMM).
>> My point is that, as exemplified by the two tests I reported above, this
>> relation already prevents you from implementing your acquire with LDAPR;
>> so my/our question was not "can you run herd7 for me?" but rather "do you
>> think that changes are needed to the .cat file?".
> I mean, you can run herd on the armv8 memory model and see the answer to the
> litmus test. So we have two options for the LKMM .cat file (the Arm one is
> baked in silicon):
> 1. Say that architectures with RCpc acquire/release instructions must
> instead use RCsc instructions (if they have them) or full fences
> 2. Change the LKMM .cat file to allow RCpc acquire/release (note that I'd
> still like RCsc lock/unlock semantics, and I think that's actually the
> real case that matters here, but the currently difficulty in
> distinguishing the two in the .cat file has led to this broader
> ordering being enforced as a side-effect)
> I prefer (2), because I think it's a safe and sensible relaxation to make
> and accomodates what I consider to be a likely extension to weakly ordered
> architectures that we might want to support efficiently. So yes, I think
> changes are needed to the LKMM .cat file, but please don't view that as a
> criticism. We change stuff all the time as long as it doesn't break userspace.
>> This question goes back _at least_ to:
>> (see, in part., the "IMPORTANT" note at the bottom of the commit message)
>> and that discussion later resulted in:
>> 0123f4d76ca63b ("riscv/spinlock: Strengthen implementations with fences")
>> 5ce6c1f3535fa8 ("riscv/atomic: Strengthen implementations with fences")
>> (the latest _draft_ of the RISC-V specification, as pointed out by Daniel,
>>, Appendix A.5
>> includes our "Linux mapping", although the currently-recommended mapping
>> differs and involves a "fence.tso [+ any acquire, including RCpc .aq]").
> [I think the discussion at hand is broader than RISC-V, and I looped in
> Daniel already]

Sorry for the delay, it was Independence Day here in the US.

I'm aligned with Will on all this so far. As mentioned above, this issue
definitely comes up on RISC-V, but for reasons even beyond RCpc vs. RCsc.
Why? On RISC-V, we have RCsc atomics, but no native load-acquire or
store-release opcodes. You might expect you can use "ld; fence r,rw"
and "fence rw,w; sd", respectively, but those don't interact with the
RCsc .aq/.rl mappings. For example, suppose we did this:

amoswap.w.rl x0, x0, 0(a1) // st-rel using amoswap with dummy dest
lw a0, 0(a1) // ld-acq using fence-based mapping
fence r,rw

Nothing here orders (a) before (b) in general. A mapping always using
fences would cover what's being asked for (I believe), and using only
RCsc operations (like ARM does) would also work if we had native
load-acquire and store-release operations, but this mixed mapping
doesn't. That's why the spec currently recommends the stronger
fence.tso (~= multi-copy atomic Power lwsync) instead.

...but it's very non-obvious. Is there really a hurry to rush all
this in?

>> My understanding is that your answer to this question is "Yes", but I am
>> not sure about the "How/Which changes?"; of course, an answer his question
>> _in the form_ of PATCHes would be appreciated! (but please also consider
>> that I'll be offline for most of the time until next Monday.)
> C'mon, I'm reviewing patches here. The onus shouldn't be on the reviewer to
> come back with the correct version of the patch. I'm also not terribly
> worried if LKMM says the wrong thing whilst we work out what the right
> thing should be, but I /would/ be worried if we started adding full fences to
> an architecture that has acquire/release instructions just because they're
> RCpc. If it turns out that no other arch maintainers care, then fine,
> because frankly this doesn't affect me, but so far it's basically been
> silence and I'd really like some more input before we close the door on
> them

I care! Thanks for speaking up Will!


> Will