Re: [RFC][PATCH 5/6] x86/topo: Fix SNC topology mess

From: Tim Chen

Date: Thu Feb 26 2026 - 17:11:55 EST


On Thu, 2026-02-26 at 11:00 -0800, Tim Chen wrote:
> On Fri, 2026-02-27 at 01:07 +0800, Chen, Yu C wrote:
> > Hi Peter,
> >
> > On 2/26/2026 6:49 PM, Peter Zijlstra wrote:
> > > + int u = __num_nodes_per_package;
> >
> > Yes, this is much simpler, thanks for the patch!
> >
> > > + long d = 0;
> > > + int x, y;
> > > +
> > > + /*
> > > + * Is this a unit cluster on the trace?
> > > + */
> > > + if ((i / u) == (j / u))
> > > + return node_distance(i, j);
> >
> > If the number of nodes per package is 3, we assume that
> > every 3 consecutive nodes are SNC siblings (on the same
> > trace):node0, node1, and node2 are SNC siblings, while
> > node3, node4, and node5 form another group of SNC siblings.
> >
> > I have a curious thought: could it be possible that
> > node0, node2, and node4 are SNC siblings, and node1,
> > node3, and node5 are another set of SNC siblings instead?
> >
> > Then I studied the code a little more, node ids are dynamically
> > allocated via the acpi_map_pxm_to_node, so the assignment of node
> > ids depends on the order in which each processor affinity structure
> > is listed in the SRAT table. For example, suppose CPU0 belongs to
> > package0 and CPU1 belongs to package1, but their entries are placed
> > consecutively in the SRAT. In this case, the Proximity Domain of
> > CPU0 would be mapped to node0 via acpi_map_pxm_to_node, and CPU1’s
> > Proximity Domain would be assigned node1. The logic above would
> > then treat them as belonging to the same package, even though they
> > are physically in different packages. However, I believe such a
> > scenario is unlikely to occur in practice in the BIOS and if it
> > happens it should be a BIOS bug if I understand correctly.
> >
> >
>
> May be a good idea to sanity check that the nodes in the first unit cluster
> has the same package id and give a WARNING if that's not the case.
>
Perhaps something like below

Tim

---
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d97f8f4e014c..38384ea5253a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -511,6 +511,24 @@ static int slit_cluster_distance(int i, int j)
int u = __num_nodes_per_package;
long d = 0;
int x, y;
+ static int valid_slit = 0;
+
+ if (valid_slit == -1)
+ return node_distance(i, j);
+
+ if (valid_slit == 0) {
+ /* Check first nodes in package are grouped together consecutively */
+ for (x = 0; x < u-1 ; x++) {
+ if (topology_physical_package_id(x) !=
+ topology_physical_package_id(x+1)) {
+ pr_warn("Expect nodes %d and %d to be in the same package",
+ x, x+1);
+ valid_slit = -1;
+ return node_distance(i, j);
+ }
+ }
+ valid_slit = 1;
+ }

/*
* Is this a unit cluster on the trace?