Re: [PATCH] mm: fix panic in __alloc_pages
From: Michal Hocko
Date: Tue Nov 02 2021 - 10:35:36 EST
On Tue 02-11-21 14:52:01, Oscar Salvador wrote:
> On Tue, Nov 02, 2021 at 02:25:03PM +0100, Michal Hocko wrote:
> > I think we want to learn how exactly Alexey brought that cpu up. Because
> > his initial thought on add_cpu resp cpu_up doesn't seem to be correct.
> > Or I am just not following the code properly. Once we know all those
> > details we can get in touch with cpu hotplug maintainers and see what
> > can we do.
>
> I am not really familiar with CPU hot-onlining, but I have been taking a look.
> As with memory, there are two different stages, hot-adding and onlining (and the
> counterparts).
>
> Part of the hot-adding being:
>
> acpi_processor_get_info
> acpi_processor_hotadd_init
> arch_register_cpu
> register_cpu
>
> One of the things that register_cpu() does is to set cpu->dev.bus pointing to
> &cpu_subsys, which is:
>
> struct bus_type cpu_subsys = {
> .name = "cpu",
> .dev_name = "cpu",
> .match = cpu_subsys_match,
> #ifdef CONFIG_HOTPLUG_CPU
> .online = cpu_subsys_online,
> .offline = cpu_subsys_offline,
> #endif
> };
>
> Then, the onlining part (in case of a udev rule or someone onlining the device)
> would be:
>
> online_store
> device_online
> cpu_subsys_online
> cpu_device_up
> cpu_up
> ...
> online node
>
> Since Alexey disabled the udev rule and no one onlined the CPU, online_store()->
> device_online() wasn't really called.
>
> The following only applies to x86_64:
> I think we got confused because cpu_device_up() is also called from add_cpu(),
> but that is an exported function and x86 does not call add_cpu() unless for
> debugging purposes (check kernel/torture.c and arch/x86/kernel/topology.c).
> It does the onlining through online_store()...
> So we can take add_cpu() off the equation here.
Yes, so the real problem is (thanks for pointing me to the acpi code).
The cpu->node association is done in acpi_map_cpu2node and I suspect
this expects that the node is already present as it gets the information
from SRAT/PXM tables which are parsed during boot. But I might be just
confused or maybe just VMware inject new entries here somehow.
Another interesting thing is that acpi_map_cpu2node skips over
association if there is no node found in SRAT but that should only mean
it would use the default initialization which should be hopefuly 0.
Anyway, I have found in my notes
https://www.spinics.net/lists/kernel/msg3010886.html which is a slightly
different problem but it has some notes about how the initialization
mess works (that one was boot time though and hotplug might be different
actually).
I have ran out of time for this today so hopefully somebody can re-learn
that from there...
--
Michal Hocko
SUSE Labs