Re: [PATCH] nvmet: use try_cmpxchg in nvmet_update_sq_head
From: Uros Bizjak
Date: Tue Oct 25 2022 - 02:48:54 EST
On Tue, Oct 25, 2022 at 2:47 AM Chaitanya Kulkarni
<chaitanyak@xxxxxxxxxx> wrote:
>
> On 10/20/22 08:35, Uros Bizjak wrote:
> > Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in
> > nvmet_update_sq_head. x86 CMPXCHG instruction returns success in ZF flag, so
> > this change saves a compare after cmpxchg (and related move instruction in
> > front of cmpxchg).
> >
>
> Is it worth a share delts of assembly instructions of the changes above?
> as developers on block mailing list are sharing the delta between before
> and after patch including the assembly.
The difference in the assembly of nvmet_update_sq_head function is:
before:
0000000000001d30 <nvmet_update_sq_head>:
1d30: 48 8b 4f 10 mov 0x10(%rdi),%rcx
1d34: 49 89 f8 mov %rdi,%r8
1d37: 0f b7 71 1a movzwl 0x1a(%rcx),%esi
1d3b: 66 85 f6 test %si,%si
1d3e: 75 14 jne 1d54 <nvmet_update_sq_head+0x24>
1d40: 49 8b 40 08 mov 0x8(%r8),%rax
1d44: 8b 51 1c mov 0x1c(%rcx),%edx
1d47: 66 89 50 08 mov %dx,0x8(%rax)
1d4b: e9 00 00 00 00 jmpq 1d50 <nvmet_update_sq_head+0x20>
1d4c: R_X86_64_PLT32 __x86_return_thunk-0x4
1d50: 0f b7 71 1a movzwl 0x1a(%rcx),%esi
1d54: 8b 79 1c mov 0x1c(%rcx),%edi
1d57: 31 d2 xor %edx,%edx
1d59: 8d 47 01 lea 0x1(%rdi),%eax
1d5c: f7 f6 div %esi
1d5e: 89 f8 mov %edi,%eax
1d60: f0 0f b1 51 1c lock cmpxchg %edx,0x1c(%rcx)
1d65: 49 8b 48 10 mov 0x10(%r8),%rcx
1d69: 39 c7 cmp %eax,%edi
1d6b: 75 e3 jne 1d50 <nvmet_update_sq_head+0x20>
1d6d: 49 8b 40 08 mov 0x8(%r8),%rax
1d71: 8b 51 1c mov 0x1c(%rcx),%edx
1d74: 66 89 50 08 mov %dx,0x8(%rax)
1d78: e9 00 00 00 00 jmpq 1d7d <nvmet_update_sq_head+0x4d>
1d79: R_X86_64_PLT32 __x86_return_thunk-0x4
1d7d:
after:
0000000000001d30 <nvmet_update_sq_head>:
1d30: 48 8b 4f 10 mov 0x10(%rdi),%rcx
1d34: 0f b7 51 1a movzwl 0x1a(%rcx),%edx
1d38: 66 85 d2 test %dx,%dx
1d3b: 74 1e je 1d5b <nvmet_update_sq_head+0x2b>
1d3d: 8b 71 1c mov 0x1c(%rcx),%esi
1d40: 44 0f b7 c2 movzwl %dx,%r8d
1d44: 8d 46 01 lea 0x1(%rsi),%eax
1d47: 31 d2 xor %edx,%edx
1d49: 41 f7 f0 div %r8d
1d4c: 89 f0 mov %esi,%eax
1d4e: f0 0f b1 51 1c lock cmpxchg %edx,0x1c(%rcx)
1d53: 48 8b 4f 10 mov 0x10(%rdi),%rcx
1d57: 89 c6 mov %eax,%esi
1d59: 75 10 jne 1d6b <nvmet_update_sq_head+0x3b>
1d5b: 48 8b 47 08 mov 0x8(%rdi),%rax
1d5f: 8b 51 1c mov 0x1c(%rcx),%edx
1d62: 66 89 50 08 mov %dx,0x8(%rax)
1d66: e9 00 00 00 00 jmpq 1d6b <nvmet_update_sq_head+0x3b>
1d67: R_X86_64_PLT32 __x86_return_thunk-0x4
1d6b: 0f b7 51 1a movzwl 0x1a(%rcx),%edx
1d6f: eb cf jmp 1d40 <nvmet_update_sq_head+0x10>
1d71:
You can see that in addition to the smaller size of the function, the
load of req->sq->size at 1d6b got moved to a cold path. As the main
benefit, the load at 1d3d is now out of the loop, and the value in
%esi is now provided by cmpxchg insn itself at 1d4e (plus move at
1d57). Unfortunately, the division clobbers %eax, so some reg-reg
moves are necessary. Note also that the compare at 1d69 is now gone.
> I also hope that you have tested this change with blktests nvme.
No, I didn't test the patch that thoroughly, but the change is the
same as some similar recent changes in the generic code, so I
confirmed the patch by inspecting asm code. OTOH, the kernel booted
from a NVME SSD.
> Either way:-
>
> Reviewed-by: ChaItanya Kulkarni <kch@xxxxxxxxxx>
Thanks!
Uros.