Re: [PATCH v4 3/4] locking/qspinlock: Add ARCH_USE_QUEUED_SPINLOCKS_XCHG32
From: Arnd Bergmann
Date: Tue Mar 30 2021 - 03:13:00 EST
On Tue, Mar 30, 2021 at 4:26 AM Guo Ren <guoren@xxxxxxxxxx> wrote:
> On Mon, Mar 29, 2021 at 9:56 PM Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > On Mon, Mar 29, 2021 at 2:52 PM Guo Ren <guoren@xxxxxxxxxx> wrote:
> > > On Mon, Mar 29, 2021 at 7:31 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > > >
> > > > What's the architectural guarantee on LL/SC progress for RISC-V ?
> >
> > "When LR/SC is used for memory locations marked RsrvNonEventual,
> > software should provide alternative fall-back mechanisms used when
> > lack of progress is detected."
> >
> > My reading of this is that if the example you tried stalls, then either
> > the PMA is not RsrvEventual, and it is wrong to rely on ll/sc on this,
> > or that the PMA is marked RsrvEventual but the implementation is
> > buggy.
>
> Yes, PMA just defines physical memory region attributes, But in our
> processor, when MMU is enabled (satp's value register > 2) in s-mode,
> it will look at our custom PTE's attributes BIT(63) ref [1]:
>
> PTE format:
> | 63 | 62 | 61 | 60 | 59 | 58-8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0
> SO C B SH SE RSW D A G U X W R V
> ^ ^ ^ ^ ^
> BIT(63): SO - Strong Order
> BIT(62): C - Cacheable
> BIT(61): B - Bufferable
> BIT(60): SH - Shareable
> BIT(59): SE - Security
>
> So the memory also could be RsrvNone/RsrvEventual.
I was not talking about RsrvNone, which would clearly mean that
you cannot use lr/sc at all (trap would trap, right?), but "RsrvNonEventual",
which would explain the behavior you described in an earlier reply:
| u32 a = 0x55aa66bb;
| u16 *ptr = &a;
|
| CPU0 CPU1
| ========= =========
| xchg16(ptr, new) while(1)
| WRITE_ONCE(*(ptr + 1), x);
|
| When we use lr.w/sc.w implement xchg16, it'll cause CPU0 deadlock.
As I understand, this example must not cause a deadlock on
a compliant hardware implementation when the underlying memory
has RsrvEventual behavior, but could deadlock in case of
RsrvNonEventual
> [1] https://github.com/c-sky/csky-linux/commit/e837aad23148542771794d8a2fcc52afd0fcbf88
>
> >
> > It also seems that the current "amoswap" based implementation
> > would be reliable independent of RsrvEventual/RsrvNonEventual.
>
> Yes, the hardware implementation of AMO could be different from LR/SC.
> AMO could use ACE snoop holding to lock the bus in hw coherency
> design, but LR/SC uses an exclusive monitor without locking the bus.
>
> RISC-V hasn't CAS instructions, and it uses LR/SC for cmpxchg. I don't
> think LR/SC would be slower than CAS, and CAS is just good for code
> size.
What I meant here is that the current spinlock uses a simple amoswap,
which presumably does not suffer from the lack of forward process you
described.
Arnd