Re: [PATCH V2] asm-generic: ticket-lock: Optimize arch_spin_value_unlocked
From: Mateusz Guzik
Date: Mon Aug 07 2023 - 14:37:06 EST
On Sun, Jul 30, 2023 at 10:33:08PM -0400, guoren@xxxxxxxxxx wrote:
> From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx>
>
> The arch_spin_value_unlocked would cause an unnecessary memory
> access to the contended value. Although it won't cause a significant
> performance gap in most architectures, the arch_spin_value_unlocked
> argument contains enough information. Thus, remove unnecessary
> atomic_read in arch_spin_value_unlocked().
>
> The caller of arch_spin_value_unlocked() could benefit from this
> change. Currently, the only caller is lockref.
>
Have you verified you are getting an extra memory access from this in
lockref? What architecture is it?
I have no opinion about the patch itself, I will note though that the
argument to the routine is *not* the actual memory-shared lockref,
instead it's something from the local copy obtained with READ_ONCE
from the real thing. So I would be surprised if the stock routine was
generating accesses to that sucker.
Nonetheless, if the patched routine adds nasty asm, that would be nice
to sort out.
FWIW on x86-64 qspinlock is used (i.e. not the stuff you are patching)
and I verified there are only 2 memory accesses -- the initial READ_ONCE
and later cmpxchg. I don't know which archs *don't* use qspinlock.
It also turns out generated asm is quite atrocious and cleaning it up
may yield a small win under more traffic. Maybe I'll see about it later
this week.
For example, disassembling lockref_put_return:
<+0>: mov (%rdi),%rax <-- initial load, expected
<+3>: mov $0x64,%r8d
<+9>: mov %rax,%rdx
<+12>: test %eax,%eax <-- retries loop back here
<-- this is also the unlocked
check
<+14>: jne 0xffffffff8157aba3 <lockref_put_return+67>
<+16>: mov %rdx,%rsi
<+19>: mov %edx,%edx
<+21>: sar $0x20,%rsi
<+25>: lea -0x1(%rsi),%ecx <-- new.count--;
<+28>: shl $0x20,%rcx
<+32>: or %rcx,%rdx
<+35>: test %esi,%esi
<+37>: jle 0xffffffff8157aba3 <lockref_put_return+67>
<+39>: lock cmpxchg %rdx,(%rdi) <-- the attempt to change
<+44>: jne 0xffffffff8157ab9a <lockref_put_return+58>
<+46>: shr $0x20,%rdx
<+50>: mov %rdx,%rax
<+53>: jmp 0xffffffff81af8540 <__x86_return_thunk>
<+58>: mov %rax,%rdx
<+61>: sub $0x1,%r8d <-- retry count check
<+65>: jne 0xffffffff8157ab6c <lockref_put_return+12> <-- go back
<+67>: mov $0xffffffff,%eax
<+72>: jmp 0xffffffff81af8540 <__x86_return_thunk>