Re: [PATCH] x86: Consider multiple nodes in a single socket to be "sane"

From: Brice Goglin
Date: Tue Sep 16 2014 - 12:59:51 EST


Le 16/09/2014 05:29, Peter Zijlstra a écrit :
>
>> This also fixes sysfs because CPUs with the same 'physical_package_id'
>> in /sys/devices/system/cpu/cpu*/topology/ are not listed together
>> in the same 'core_siblings_list'. This violates a statement from
>> Documentation/ABI/testing/sysfs-devices-system-cpu:
>>
>> core_siblings: internal kernel map of cpu#'s hardware threads
>> within the same physical_package_id.
>>
>> core_siblings_list: human-readable list of the logical CPU
>> numbers within the same physical_package_id as cpu#.
> No that statement is wrong; it assumes physical_package_id is a good
> identifier for nodes. Clearly this is no longer true.
>
> The idea is that core_siblings (or rather cpu_core_mask) is a mask of
> all cores on a node.

or rather node_core_mask since it's not clear whether a CPU is a package
or a die or a node or whatever?

Aside from the bad naming, I fail to see why core_siblings couldn't be
package-wide.

Node-wide masks are already in /sys/devices/system/node/node*/cpumap

>> The sysfs effects here cause an issue with the hwloc tool where
>> it gets confused and thinks there are more sockets than are
>> physically present.
> Meh, so then we need another mask.

Note that AMD tried to add a "cpu_node" mask back in 2009 for the same
reason for their 6xxx CPUs but it was rejected :/
https://lkml.org/lkml/2009/6/3/244

>> diff -puN arch/x86/kernel/smpboot.c~hsw-cod-is-sane arch/x86/kernel/smpboot.c
>> --- a/arch/x86/kernel/smpboot.c~hsw-cod-is-sane 2014-09-15 14:56:20.012314468 -0700
>> +++ b/arch/x86/kernel/smpboot.c 2014-09-15 14:58:58.837506644 -0700
>> @@ -344,10 +344,13 @@ static bool match_llc(struct cpuinfo_x86
>> static bool match_mc(struct cpuinfo_x86 *c, struct cpuinfo_x86 *o)
>> {
>> if (c->phys_proc_id == o->phys_proc_id) {
>> - if (cpu_has(c, X86_FEATURE_AMD_DCM))
>> - return true;
>> -
>> - return topology_sane(c, o, "mc");
>> + /*
>> + * We used to enforce that 'c' and 'o' be on the
>> + * same node, but AMD's DCM and Intel's Cluster-
>> + * on-Die (CoD) support both have physical
>> + * processors that span NUMA nodes.
>> + */
>> + return true;
>> }
>> return false;
>> }
> This is wrong (and I suppose the AMD case was already wrong). That
> function is supposed to match a multi-core group which is very much
> supposed to be smaller-or-equal to a node, not spanning nodes.

Well, even if it's wrong, it matches what the sysfs doc says. We have
been using this ABI in userspace tools for years and it always properly
detected AMD CPUs. So if you want to break things, please remove
core_siblings entirely and add new well-defined package-wide and
node-wide masks :/

Brice

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/