Re: [RFC PATCH] riscv/locking: Strengthen spin_lock() and spin_unlock()

From: Daniel Lustig
Date: Thu Mar 01 2018 - 17:21:26 EST


On 3/1/2018 1:54 PM, Palmer Dabbelt wrote:
> On Thu, 01 Mar 2018 07:11:41 PST (-0800), parri.andrea@xxxxxxxxx wrote:
>> Hi Daniel,
>>
>> On Thu, Feb 22, 2018 at 11:47:57AM -0800, Daniel Lustig wrote:
>>> On 2/22/2018 10:27 AM, Peter Zijlstra wrote:
>>> > On Thu, Feb 22, 2018 at 10:13:17AM -0800, Paul E. McKenney wrote:
>>> >> So we have something that is not all that rare in the Linux kernel
>>> >> community, namely two conflicting more-or-less concurrent changes.
>>> >> This clearly needs to be resolved, either by us not strengthening the
>>> >> Linux-kernel memory model in the way we were planning to or by you
>>> >> strengthening RISC-V to be no weaker than PowerPC for these sorts of
>>> >> externally viewed release-acquire situations.
>>> >>
>>> >> Other thoughts?
>>> >
>>> > Like said in the other email, I would _much_ prefer to not go weaker
>>> > than PPC, I find that PPC is already painfully weak at times.
>>>
>>> Sure, and RISC-V could make this work too by using RCsc instructions
>>> and/or by using lightweight fences instead. It just wasn't clear at
>>> first whether smp_load_acquire() and smp_store_release() were RCpc,
>>> RCsc, or something else, and hence whether RISC-V would actually need
>>> to use something stronger than pure RCpc there. Likewise for
>>> spin_unlock()/spin_lock() and everywhere else this comes up.
>>
>> while digging into riscv's locks and atomics to fix the issues discussed
>> earlier in this thread, I became aware of another issue:
>>
>> Considering here the CMPXCHG primitives, for example, I noticed that the
>> implementation currently provides the four variants
>>
>> ÂÂÂÂATOMIC_OPS(ÂÂÂÂÂÂÂ , .aqrl)
>> ÂÂÂÂATOMIC_OPS(_acquire,ÂÂ .aq)
>> ÂÂÂÂATOMIC_OPS(_release,ÂÂ .rl)
>> ÂÂÂÂATOMIC_OPS(_relaxed,ÂÂÂÂÂ )
>>
>> (corresponding, resp., to
>>
>> ÂÂÂÂatomic_cmpxchg()
>> ÂÂÂÂatomic_cmpxchg_acquire()
>> ÂÂÂÂatomic_cmpxchg_release()
>> ÂÂÂÂatomic_cmpxchg_relaxed()Â )
>>
>> so that the first variant above ends up doing
>>
>> ÂÂÂÂ0: lr.w.aqrl %0, %addr
>> ÂÂÂÂÂÂÂ bneÂÂÂÂÂÂÂ %0, %old, 1f
>>  sc.w.aqrl %1, %new, %addr
>> ÂÂÂÂÂÂÂ bnezÂÂÂÂÂÂ %1, 0b
>> ÂÂÂÂÂÂÂ 1:
>>
>> From Sect. 2.3.5. ("Acquire/Release Ordering") of the Spec.,
>>
>> Â"AMOs with both .aq and .rl set are fully-ordered operations. Treating
>> Â the load part and the store part as independent RCsc operations is not
>> Â in and of itself sufficient to enforce full fencing behavior, but this
>> Â subtle weak behavior is counterintuitive and not much of an advantage
>> Â architecturally, especially with lr and sc also available [...]."
>>
>> I understand that
>>
>> ÂÂÂÂÂÂÂ { x = y = u = v = 0 }
>>
>> ÂÂÂÂP0()
>>
>> ÂÂÂÂWRITE_ONCE(x, 1);ÂÂÂÂÂÂÂ ("relaxed" store, sw)
>> ÂÂÂÂatomic_cmpxchg(&u, 0, 1);
>> ÂÂÂÂr0 = READ_ONCE(y);ÂÂÂÂÂÂÂ ("relaxed" load, lw)
>>
>> ÂÂÂÂP1()
>>
>> ÂÂÂÂWRITE_ONCE(y, 1);
>> ÂÂÂÂatomic_cmpxchg(&v, 0, 1);
>> ÂÂÂÂr1 = READ_ONCE(x);
>>
>> could result in (u = v = 1 and r0 = r1 = 0) at the end; can you confirm?
>
> cmpxchg isn't an AMO, it's an LR SC sequence, so that blurb doesn't
> apply. I think "lr.w.aqrl" and "sc.w.aqrl" is not sufficient to
> perform a fully ordered operation (ie, it's an incorrect
> implementation of atomic_cmpxchg()), but I was hoping to get some
> time to actually internalize this part of the RISC-V memory model at
> some point to be sure.

Agreed, the current intention is that we'll add a fence rw,rw in there
when we use lr/sc as the implementation, likely similar to what ARM does:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e86f0b409a44193f1587e87b69c5dcf8f65be67

It's also probably not the only piece of synchronization-related code
we'll have to go back and audit as we finalize the RISC-V memory
consistency model over the next couple months or so.

Dan