Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess
From: Tim Chen
Date: Thu Feb 26 2026 - 17:26:57 EST
On Thu, 2026-02-26 at 14:11 -0800, Tim Chen wrote:
> On Thu, 2026-02-26 at 11:00 -0800, Tim Chen wrote:
> > On Fri, 2026-02-27 at 01:07 +0800, Chen, Yu C wrote:
> > > Hi Peter,
> > >
> > > On 2/26/2026 6:49 PM, Peter Zijlstra wrote:
> > > > + int u = __num_nodes_per_package;
> > >
> > > Yes, this is much simpler, thanks for the patch!
> > >
> > > > + long d = 0;
> > > > + int x, y;
> > > > +
> > > > + /*
> > > > + * Is this a unit cluster on the trace?
> > > > + */
> > > > + if ((i / u) == (j / u))
> > > > + return node_distance(i, j);
> > >
> > > If the number of nodes per package is 3, we assume that
> > > every 3 consecutive nodes are SNC siblings (on the same
> > > trace):node0, node1, and node2 are SNC siblings, while
> > > node3, node4, and node5 form another group of SNC siblings.
> > >
> > > I have a curious thought: could it be possible that
> > > node0, node2, and node4 are SNC siblings, and node1,
> > > node3, and node5 are another set of SNC siblings instead?
> > >
> > > Then I studied the code a little more, node ids are dynamically
> > > allocated via the acpi_map_pxm_to_node, so the assignment of node
> > > ids depends on the order in which each processor affinity structure
> > > is listed in the SRAT table. For example, suppose CPU0 belongs to
> > > package0 and CPU1 belongs to package1, but their entries are placed
> > > consecutively in the SRAT. In this case, the Proximity Domain of
> > > CPU0 would be mapped to node0 via acpi_map_pxm_to_node, and CPU1’s
> > > Proximity Domain would be assigned node1. The logic above would
> > > then treat them as belonging to the same package, even though they
> > > are physically in different packages. However, I believe such a
> > > scenario is unlikely to occur in practice in the BIOS and if it
> > > happens it should be a BIOS bug if I understand correctly.
> > >
> > >
> >
> > May be a good idea to sanity check that the nodes in the first unit cluster
> > has the same package id and give a WARNING if that's not the case.
> >
> Perhaps something like below
>
> Tim
>
> ---
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index d97f8f4e014c..38384ea5253a 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -511,6 +511,24 @@ static int slit_cluster_distance(int i, int j)
> int u = __num_nodes_per_package;
> long d = 0;
> int x, y;
> + static int valid_slit = 0;
> +
> + if (valid_slit == -1)
> + return node_distance(i, j);
> +
> + if (valid_slit == 0) {
> + /* Check first nodes in package are grouped together consecutively */
> + for (x = 0; x < u-1 ; x++) {
> + if (topology_physical_package_id(x) !=
> + topology_physical_package_id(x+1)) {
This won't work because topology_physical_package_id() takes cpu
as argument. Will need to find the first cpu of x and x+1 and pass to it.
> + pr_warn("Expect nodes %d and %d to be in the same package",
> + x, x+1);
> + valid_slit = -1;
> + return node_distance(i, j);
> + }
> + }
> + valid_slit = 1;
> + }
>
> /*
> * Is this a unit cluster on the trace?
>