Re: [RFC v1 PATCH 0/11] Optimize this_cpu_*() ops for non-x86 (ARM64 for this series)

From: Yang Shi

Date: Fri May 15 2026 - 14:41:12 EST




On 5/15/26 9:28 AM, Heiko Carstens wrote:
On Wed, May 13, 2026 at 05:00:19PM -0700, Yang Shi wrote:
On 5/12/26 2:02 AM, David Hildenbrand (Arm) wrote:
There was quite some feedback during the LSF/MM session, what's the current plan?
...
I'm not sure whether S390 folks will implement this on S390 or not, anyway
they are cc'ed.
I'm not sure yet, however after I had a look at the architecture documentation
a couple of weeks ago, I think it shouldn't be too hard to get this working on
s390 as well. I was a bit concerned about TLB flushing, if changes to the
kernel mapping happen with per-cpu page tables, but as of now I believe this
shouldn't cause any harm (famous last words...).

Yeah, it shouldn't. Kernel needs to flush TLB for all CPUs regardless of percpu page table when kernel mapping is changed. There should not be any extra overhead for the most cases.

Some extra TLB flush is needed for "percpu local mapping area", but all CPUs use the same virtual address, so we should just need one more TLB flush call with the same virtual address for all CPUs. In addition, the percpu chunk destruction happens asynchronously in work queue. Umapping page tables, flushing TLB and freeing pages all happen in work queue when the whole chunk is freed. The fast path basically just updates an allocation bitmap.

Thanks,
Yang