Re: [PATCH v3 0/9] s390: Improve this_cpu operations
From: Yang Shi
Date: Thu May 28 2026 - 15:19:59 EST
On 5/28/26 2:03 AM, David Laight wrote:
On Wed, 27 May 2026 16:44:31 -0700
Yang Shi <yang@xxxxxxxxxxxxxxxxxxxxxx> wrote:
On 5/22/26 2:18 AM, Heiko Carstens wrote:...
Is 'current' kept in a cpu hardware register?It is amazing to see the performance improvements you see on arm64, howeverYes, we need 4 instructions on ARM64 for disabling/enabling preempt (one
I believe that is mainly because of the large amount of code which is
generated by the arm64 implementations of the preempt primitives
__preempt_count_add() and __preempt_count_dec_and_test().
instruction is used to load current pointer, the other 3 instructions
are used to RMW preempt_count). So I can remove 8 instructions in total
for a single this_cpu ops. That's a lot. Given this_cpu ops are heavily
used in kernel, we end up running fewer instructions and having better
icache hit rate, the better icache hit rate also helps reduce cross node
traffic for 2-socket system.
Yes, sp_el0. But it is a special register, we need move it to a general register before any ARM64 instructions can access it.
With the process switch code updating current->per_cpu_data.
That might mean that you can access per-cpu data without disabling
preemption (for single ops) using the same technique as s390.
So something like:
mov %ra, current
movb per_cpu_reg(%ra), $b
mov %rb, per_cpu_data(%ra)
// per-cpu access using %rb, process switch code will update %rb
movb per_cpu_reg(%ra), $255
An add will need to use a cmpxchg loop.
For simplicity use a fixed register for %rb.
TBH, I can't say I fully understand what you proposed. But it sounds like this one https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?id=84ee5f23f93d4a650e828f831da9ed29c54623c5
Thanks,
Yang
-- David