Re: [PATCH] x86/smpboot: Make logical package management more robust

From: Boris Ostrovsky
Date: Sat Dec 10 2016 - 22:26:06 EST

On 12/10/2016 02:13 PM, Thomas Gleixner wrote:
On Sat, 10 Dec 2016, Thomas Gleixner wrote:
On Fri, 9 Dec 2016, Boris Ostrovsky wrote:
On 12/09/2016 06:02 PM, Boris Ostrovsky wrote:
On 12/09/2016 05:06 PM, Thomas Gleixner wrote:
On Thu, 8 Dec 2016, Thomas Gleixner wrote:

Boris, can you please verify if that makes the
topology_update_package_map() call which you placed into the Xen cpu
starting code obsolete ?

Will do. I did test your patch but without removing
topology_update_package_map() call. It complained about package IDs
being wrong, but that's expected until I fix Xen part.

Ignore my statement about earlier testing --- it was all on single-node

Something is broken with multi-node on Intel, but failure modes are different.
Prior to this patch build_sched_domain() reports an error and pretty soon we
crash in scheduler (don't remember off the top of my head). With patch applied
I crash mush later, when one of the drivers does kmalloc_node(..,
cpu_to_node(cpu)) and cpu_to_node() returns 1, which should never happen
("x86: Booted up 1 node, 32 CPUs" is reported, for example).

Hmm. But the cpu_to_node() association is unrelated to the logical package

Just came to my mind after hitting send. We had the whole persistent cpuid
to nodeid association work merged in 4.9. So that might be related.

Yes, that's exactly the reason.

It uses _PXM to set nodeID and _PXM is exposed to dom0 (which is a privileged PV guest).

Re: you previous message: after I "fix" the problem above, I see
pr_info("Max logical packages: %u\n", __max_logical_packages);
but no
pr_warn(CPU %u Converting physical %u to logical package %u\n", ...)

with or without topology_update_package_map() in arch/x86/xen/smp.c:cpu_bringup()