Re: [PATCH v2] mm: Only re-generate demotion targets when a numa node changes its N_CPU state

From: Abhishek Goel
Date: Mon Mar 14 2022 - 05:09:54 EST



On 11/03/22 22:40, Dave Hansen wrote:
On 3/10/22 18:39, Andrew Morton wrote:
On Thu, 10 Mar 2022 13:07:49 +0100 Oscar Salvador <osalvador@xxxxxxx> wrote:
We do already have two CPU callbacks (vmstat_cpu_online() and vmstat_cpu_dead())
that check exactly that, so get rid of the CPU callbacks in
migrate_on_reclaim_init() and only call set_migration_target_nodes() from
vmstat_cpu_{dead,online}() whenever a numa node change its N_CPU state.
What I'm not getting here (as so often happens) is a sense of how badly
this affects our users. Does anyone actually hotplug frequently enough
to care?
I asked Abhishek about this a bit here:

https://lore.kernel.org/all/4e8067e1-0574-c9d2-9d6c-d676d32071bd@xxxxxxxxxxxxxxxxxx/
It sounded to me like there are ppc users who convert their systems from
SMT=1 to SMT=8. I'd guess that they want to do this as a side-channel
mitigation because ppc has been dealing with the same basic issues as
those of us over in x86 land. The increase in time (20s->36s) would be
noticeable and probably slightly annoying to a human waiting on it.

I'd love to hear more details on this from Abhishek, like whether end
users do this as opposed to IBM's kernel developers. But, it does sound
deserving of a stable@ tag to me.
Yes, end users also use this, especially on large systems, might want
to switch between SMT=1, SMT=4 and SMT=8.
And this is also usable for dynamic LPAR operations.
As Dave pointed out, this increase in time while manageable and just
noticeable on smaller systems, can be very clearly observed as the
systems become larger.