Re: [PATCH 2/2] tools/memory-model: Add write ordering by release-acquire and by locks
From: Daniel Lustig
Date: Thu Jul 05 2018 - 11:35:36 EST
On 7/5/2018 8:16 AM, Daniel Lustig wrote:
> On 7/5/2018 7:44 AM, Will Deacon wrote:
>> On Thu, Jul 05, 2018 at 04:00:29PM +0200, Andrea Parri wrote:
>>> On Wed, Jul 04, 2018 at 01:11:04PM +0100, Will Deacon wrote:
>>>> On Tue, Jul 03, 2018 at 01:28:17PM -0400, Alan Stern wrote:
>>>>> There's also read-write ordering, in the form of the LB pattern:
>>>>> P0(int *x, int *y, int *z)
>>>>> r0 = READ_ONCE(*x);
>>>>> smp_store_release(z, 1);
>>>>> r1 = smp_load_acquire(z);
>>>>> WRITE_ONCE(*y, 1);
>>>>> P1(int *x, int *y)
>>>>> r2 = READ_ONCE(*y);
>>>>> WRITE_ONCE(*x, 1);
>>>>> exists (0:r0=1 /\ 1:r2=1)
>>>> The access types are irrelevant to the acquire/release primitives, so yes
>>>> that's also allowed.
>>>>> Would this be allowed if smp_load_acquire() was implemented with LDAPR?
>>>>> If the answer is yes then we will have to remove the rfi-rel-acq and
>>>>> rel-rf-acq-po relations from the memory model entirely.
>>>> I don't understand what you mean by "rfi-rel-acq-po", and I assume you mean
>>>> rel-rfi-acq-po for the other? Sounds like I'm confused here.
>>> [Your reply about 1/2 suggests that you've figured this out now, IAC ...]
>> Yeah, the naming threw me because it's not the same order as the composition
>> in the actual definition (why not?). I typoed an extra 'po' suffix. Well
>> done for spotting it.
>>> "rfi-rel-acq" (as Alan wrote, and as I wrote before my question above...)
>>> is defined and currently used in linux-kernel.cat (and, FWIW, it has been
>>> so since when we upstreamed LKMM).
>>> My point is that, as exemplified by the two tests I reported above, this
>>> relation already prevents you from implementing your acquire with LDAPR;
>>> so my/our question was not "can you run herd7 for me?" but rather "do you
>>> think that changes are needed to the .cat file?".
>> I mean, you can run herd on the armv8 memory model and see the answer to the
>> litmus test. So we have two options for the LKMM .cat file (the Arm one is
>> baked in silicon):
>> 1. Say that architectures with RCpc acquire/release instructions must
>> instead use RCsc instructions (if they have them) or full fences
>> 2. Change the LKMM .cat file to allow RCpc acquire/release (note that I'd
>> still like RCsc lock/unlock semantics, and I think that's actually the
>> real case that matters here, but the currently difficulty in
>> distinguishing the two in the .cat file has led to this broader
>> ordering being enforced as a side-effect)
>> I prefer (2), because I think it's a safe and sensible relaxation to make
>> and accomodates what I consider to be a likely extension to weakly ordered
>> architectures that we might want to support efficiently. So yes, I think
>> changes are needed to the LKMM .cat file, but please don't view that as a
>> criticism. We change stuff all the time as long as it doesn't break userspace.
>>> This question goes back _at least_ to:
>>> (see, in part., the "IMPORTANT" note at the bottom of the commit message)
>>> and that discussion later resulted in:
>>> 0123f4d76ca63b ("riscv/spinlock: Strengthen implementations with fences")
>>> 5ce6c1f3535fa8 ("riscv/atomic: Strengthen implementations with fences")
>>> (the latest _draft_ of the RISC-V specification, as pointed out by Daniel,
>>> https://github.com/riscv/riscv-isa-manual, Appendix A.5
>>> includes our "Linux mapping", although the currently-recommended mapping
>>> differs and involves a "fence.tso [+ any acquire, including RCpc .aq]").
>> [I think the discussion at hand is broader than RISC-V, and I looped in
>> Daniel already]
> Sorry for the delay, it was Independence Day here in the US.
> I'm aligned with Will on all this so far. As mentioned above, this issue
> definitely comes up on RISC-V, but for reasons even beyond RCpc vs. RCsc.
> Why? On RISC-V, we have RCsc atomics, but no native load-acquire or
> store-release opcodes. You might expect you can use "ld; fence r,rw"
> and "fence rw,w; sd", respectively, but those don't interact with the
> RCsc .aq/.rl mappings. For example, suppose we did this:
> amoswap.w.rl x0, x0, 0(a1) // st-rel using amoswap with dummy dest
> lw a0, 0(a1) // ld-acq using fence-based mapping
> fence r,rw
> Nothing here orders (a) before (b) in general. A mapping always using
And here, I should have said that (a) and (b) are stores. But then
flip it around, so that the release uses fences and the acquire uses
RCsc, and then there's a similar lack of read->read ordering between
(a) and (b). That's what I really meant to say here. Depending on
which particular mapping you use in each case, you may or may not get
R->R or W->W ordering.
> fences would cover what's being asked for (I believe), and using only
> RCsc operations (like ARM does) would also work if we had native
> load-acquire and store-release operations, but this mixed mapping
> doesn't. That's why the spec currently recommends the stronger
> fence.tso (~= multi-copy atomic Power lwsync) instead.
> ...but it's very non-obvious. Is there really a hurry to rush all
> this in?
>>> My understanding is that your answer to this question is "Yes", but I am
>>> not sure about the "How/Which changes?"; of course, an answer his question
>>> _in the form_ of PATCHes would be appreciated! (but please also consider
>>> that I'll be offline for most of the time until next Monday.)
>> C'mon, I'm reviewing patches here. The onus shouldn't be on the reviewer to
>> come back with the correct version of the patch. I'm also not terribly
>> worried if LKMM says the wrong thing whilst we work out what the right
>> thing should be, but I /would/ be worried if we started adding full fences to
>> an architecture that has acquire/release instructions just because they're
>> RCpc. If it turns out that no other arch maintainers care, then fine,
>> because frankly this doesn't affect me, but so far it's basically been
>> silence and I'd really like some more input before we close the door on
> I care! Thanks for speaking up Will!