Re: [PATCH v3] mm: fix panic in __alloc_pages

From: Alexey Makhalov
Date: Tue Nov 16 2021 - 15:22:56 EST




> On Nov 16, 2021, at 1:17 AM, Michal Hocko <mhocko@xxxxxxxx> wrote:
>
> On Tue 16-11-21 01:31:44, Alexey Makhalov wrote:
> [...]
>> diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
>> index 6737b1cbf..bbc1a70d5 100644
>> --- a/drivers/acpi/acpi_processor.c
>> +++ b/drivers/acpi/acpi_processor.c
>> @@ -200,6 +200,10 @@ static int acpi_processor_hotadd_init(struct acpi_processor *pr)
>> * gets online for the first time.
>> */
>> pr_info("CPU%d has been hot-added\n", pr->id);
>> + {
>> + int nid = cpu_to_node(pr->id);
>> + printk("%s:%d cpu %d, node %d, online %d, ndata %p\n", __FUNCTION__, __LINE__, pr->id, nid, node_online(nid), NODE_DATA(nid));
>> + }
>> pr->flags.need_hotplug_init = 1;
>
> OK, IIUC you are adding a processor which is outside of
> possible_cpu_mask and that means that the node is not allocated for such
> a future to be hotplugged cpu and its memory node. init_cpu_to_node
> would have done that initialization otherwise.
It is not correct.

possible_cpus is 128 for this VM. Look at SRAT and percpu output for proof.
[ 0.085524] SRAT: PXM 127 -> APIC 0xfe -> Node 127
[ 0.118928] setup_percpu: NR_CPUS:128 nr_cpumask_bits:128 nr_cpu_ids:128 nr_node_ids:128

It is impossible to add processor outside of possible_cpu_mask. possible_cpus is absolute maximum
that system can support. See Documentation/core-api/cpu_hotplug.rst

Number of present and onlined CPUs (and nodes) is 4. Other 124 CPUs (and nodes) are not present, but can
be potentially hot added.
Number of initialized nodes is 4, as init_cpu_to_node() will skip not yet present nodes,
see arch/x86/mm/numa.c:798 (numa_cpu_node(CPU #4) == NUMA_NO_NODE)
788 void __init init_cpu_to_node(void)
789 {
790 int cpu;
791 u16 *cpu_to_apicid = early_per_cpu_ptr(x86_cpu_to_apicid);
792
793 BUG_ON(cpu_to_apicid == NULL);
794
795 for_each_possible_cpu(cpu) {
796 int node = numa_cpu_node(cpu);
797
798 if (node == NUMA_NO_NODE)
799 continue;
800

After CPU (and node) hot plug:
- CPU 4 is marker as present, but not yet online
- New node got ID 4. numa_cpu_node(CPU #4) returns 4
- node_online(4) == 0 and NODE_DATA(4) == NULL, but it will be accessed inside
for_each_possible_cpu loop in percpu allocation.

Digging further.
Even if x86/CPU hot add maintainers decide to clean up memoryless node hot add code to initialize the node on time of
attaching it (to be aligned with mm node while memory hot add), this percpu fix is still needed as it is used during
the node onlining, See chicken and egg problem that I described above.
Or as 2nd option, numa_cpu_node(4) should return NUMA_NO_NODE until node 4 get fully initialized.

Regards,
—Alexey


Attachment: signature.asc
Description: Message signed with OpenPGP