On Wed, 15 Nov 2017 10:06:01 PST (-0800), will.deacon@xxxxxxx wrote:
On Tue, Nov 14, 2017 at 12:30:59PM -0800, Palmer Dabbelt wrote:>
> On Tue, 24 Oct 2017 07:10:33 PDT (-0700), will.deacon@xxxxxxx wrote:
>>On Tue, Sep 26, 2017 at 06:56:31PM -0700, Palmer Dabbelt wrote:
> Hi Palmer,
>
>> >>+ATOMIC_OPS(add, add, +, i, , _relaxed)
>> >>+ATOMIC_OPS(add, add, +, i, .aq , _acquire) ATOMIC_OPS(add, add,
>> >>++, i, .rl , _release)
>> >>+ATOMIC_OPS(add, add, +, i, .aqrl, )
>> >
>> >Have you checked that .aqrl is equivalent to "ordered", since there
>> >are interpretations where that isn't the case. Specifically:
>> >
>> >// all variables zero at start of time
>> >P0:
>> >WRITE_ONCE(x) = 1;
>> >atomic_add_return(y, 1);
>> >WRITE_ONCE(z) = 1;
>> >
>> >P1:
>> >READ_ONCE(z) // reads 1
>> >smp_rmb();
>> >READ_ONCE(x) // must not read 0
>>
>> I haven't. We don't quite have a formal memory model specification yet.
>> I've added Daniel Lustig, who is creating that model. He should have
>> a better idea
>
> Thanks. You really do need to ensure that, as it's heavily relied upon.
I know it's the case for our current processors, and I'm pretty sure it's the
case for what's formally specified, but we'll have to wait for the spec in order
to prove it.
I think Will is right. In the current spec, using .aqrl converts an RCpc load
or store into an RCsc load or store, but the acquire(-RCsc) annotation still
only applies to the load part of the atomic, and the release(-RCsc) annotation
applies only to the store part of the atomic.
Why is that? Picture an machine which implements AMOs using something that
looks more like an LR/SC under the covers, or one that uses cache line locking,
or anything else along those same lines. In some such machines, there could be
a window between lock/reserve and unlock/store-conditional where other later
stores could squeeze into, and that would break Will's example among others.
It's likely the same reasoning that causes ARM to use a trailing dmb here,
rather than just using ldaxr/stlxr. Is that right Will? I know that's LL/SC
and this particular cases uses AMOADD, but it's the same principle. Well, at
least according to how we have it in the current memory model draft.
Also, RISC-V currently prefers leading fence mappings, so I think the result
here, for atomic_add_return() for example, should be this:
fence rw,rw
amoadd.aq ...
Note that at this point, I think you could even elide the .rl. If I'm reading
it right it looks like the ARM mapping does this too (well, the reverse: ARM
elides the "a" in ldaxr due to the trailing dmb making it redundant).
Does that seem reasonable to you all?