Re: [PATCH] mm: avoid use of BIT() macro for initialising VMA flags

From: David Laight

Date: Fri Dec 12 2025 - 08:02:46 EST


On Fri, 12 Dec 2025 13:24:57 +0100
Mateusz Guzik <mjguzik@xxxxxxxxx> wrote:

> So I had a look where the timing difference is coming from and I think
> I have the answer: init_ipc_ns does not have a guaranteed cacheline
> placement and things get moved around with the patch.
>
> On my kernels (nm vmlinux-newbits | sort -nk 1 | less)
>
> before:
> ffffffff839ffb60 T init_ipc_ns
> ffffffff83a00020 t event_exit__msgrcv
>
> after:
> ffffffff839ffbc0 T init_ipc_ns
> ffffffff83a00080 t event_exit__msgrcv
>
> This is the pervasive problem of vars from all .o files placed
> adjacent to each other, meaning changes in one .o file result in
> offsets changing in other files and then you get performance
> fluctuations as not-explicitly-padded variables share (or no longer
> share) cachelines.

Those look like text symbols, not data ones.
But moving code about can make the same sort of changes.

> I brought this up a year ago elsewhere:
> https://gcc.gnu.org/pipermail/gcc/2024-October/245004.html

My guess is that all the extra padding increases the cache footprint
of the code and that causes other cache lines to displaced and then
needing to be re-read from memory.
So while a specific benchmark may improve, overall system performance
goes down.

Excessive loop unrolling has the same effect.

David

>
> maybe i should pick it up again and see it through
>
> as for the thing at hand, someone(tm) will want to make sure the
> namespace is cacheline aligned and possibly pad its own internals
> afterwards. Personally I can't be bothered.