Re: [PATCH] mm: fix panic in __alloc_pages

From: Oscar Salvador
Date: Tue Nov 02 2021 - 09:52:11 EST


On Tue, Nov 02, 2021 at 02:25:03PM +0100, Michal Hocko wrote:
> I think we want to learn how exactly Alexey brought that cpu up. Because
> his initial thought on add_cpu resp cpu_up doesn't seem to be correct.
> Or I am just not following the code properly. Once we know all those
> details we can get in touch with cpu hotplug maintainers and see what
> can we do.

I am not really familiar with CPU hot-onlining, but I have been taking a look.
As with memory, there are two different stages, hot-adding and onlining (and the
counterparts).

Part of the hot-adding being:

acpi_processor_get_info
acpi_processor_hotadd_init
arch_register_cpu
register_cpu

One of the things that register_cpu() does is to set cpu->dev.bus pointing to
&cpu_subsys, which is:

struct bus_type cpu_subsys = {
.name = "cpu",
.dev_name = "cpu",
.match = cpu_subsys_match,
#ifdef CONFIG_HOTPLUG_CPU
.online = cpu_subsys_online,
.offline = cpu_subsys_offline,
#endif
};

Then, the onlining part (in case of a udev rule or someone onlining the device)
would be:

online_store
device_online
cpu_subsys_online
cpu_device_up
cpu_up
...
online node

Since Alexey disabled the udev rule and no one onlined the CPU, online_store()->
device_online() wasn't really called.

The following only applies to x86_64:
I think we got confused because cpu_device_up() is also called from add_cpu(),
but that is an exported function and x86 does not call add_cpu() unless for
debugging purposes (check kernel/torture.c and arch/x86/kernel/topology.c).
It does the onlining through online_store()...
So we can take add_cpu() off the equation here.


--
Oscar Salvador
SUSE Labs