RE: [PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32

From: Anup Patel
Date: Mon Mar 29 2021 - 08:14:08 EST




> -----Original Message-----
> From: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Sent: 29 March 2021 16:57
> To: Guo Ren <guoren@xxxxxxxxxx>
> Cc: linux-riscv <linux-riscv@xxxxxxxxxxxxxxxxxxx>; Linux Kernel Mailing List
> <linux-kernel@xxxxxxxxxxxxxxx>; linux-csky@xxxxxxxxxxxxxxx; linux-arch
> <linux-arch@xxxxxxxxxxxxxxx>; Guo Ren <guoren@xxxxxxxxxxxxxxxxx>; Will
> Deacon <will@xxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>; Waiman
> Long <longman@xxxxxxxxxx>; Arnd Bergmann <arnd@xxxxxxxx>; Anup
> Patel <anup@xxxxxxxxxxxxxx>
> Subject: Re: [PATCH v4 3/4] locking/qspinlock: Add
> ARCH_USE_QUEUED_SPINLOCKS_XCHG32
>
> On Mon, Mar 29, 2021 at 07:19:29PM +0800, Guo Ren wrote:
> > On Mon, Mar 29, 2021 at 3:50 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> wrote:
> > >
> > > On Sat, Mar 27, 2021 at 06:06:38PM +0000, guoren@xxxxxxxxxx wrote:
> > > > From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
> > > >
> > > > Some architectures don't have sub-word swap atomic instruction,
> > > > they only have the full word's one.
> > > >
> > > > The sub-word swap only improve the performance when:
> > > > NR_CPUS < 16K
> > > > * 0- 7: locked byte
> > > > * 8: pending
> > > > * 9-15: not used
> > > > * 16-17: tail index
> > > > * 18-31: tail cpu (+1)
> > > >
> > > > The 9-15 bits are wasted to use xchg16 in xchg_tail.
> > > >
> > > > Please let architecture select xchg16/xchg32 to implement
> > > > xchg_tail.
> > >
> > > So I really don't like this, this pushes complexity into the generic
> > > code for something that's really not needed.
> > >
> > > Lots of RISC already implement sub-word atomics using word ll/sc.
> > > Obviously they're not sharing code like they should be :/ See for
> > > example arch/mips/kernel/cmpxchg.c.
> > I see, we've done two versions of this:
> > - Using cmpxchg codes from MIPS by Michael
> > - Re-write with assembly codes by Guo
> >
> > But using the full-word atomic xchg instructions implement xchg16 has
> > the semantic risk for atomic operations.
>
> What? -ENOPARSE
>
> > > Also, I really do think doing ticket locks first is a far more
> > > sensible step.
> > NACK by Anup
>
> Who's he when he's not sending NAKs ?

We had discussions in the RISC-V platforms group about this. Over there,
We had evaluated all spin lock approaches (ticket, qspinlock, etc) tried
in Linux till now. It was concluded in those discussions that eventually we
have to move to qspinlock (even if we moved to ticket spinlock temporarily)
because qspinlock avoids cache line bouncing. Also, moving to qspinlock
will be aligned with other major architectures supported in Linux (such as
x86, ARM64)

Some of the organizations working on high-end RISC-V systems (> 32
CPUs) are interested in having an optimized spinlock implementation
(just like other major architectures x86 and ARM64).

Based on above, Linux RISC-V should move to qspinlock.

Regards,
Anup