On 8/11/17 16:15, Peter Zijlstra wrote:
On Fri, Aug 11, 2017 at 12:58:22PM +0700, Suravee Suthikulpanit wrote:
On 8/11/17 11:57, Suravee Suthikulpanit wrote:
[...]
@@ -1445,9 +1448,24 @@ void sched_init_numa(void)
tl[i] = sched_domain_topology[i];
/*
+ * Ignore the NUMA identity level if it has the same cpumask
+ * as previous level. This is the case for:
+ * - System with last-level-cache (MC) sched domain span a NUMA node.
+ * - System with DIE sched domain span a NUMA node.
+ *
+ * Assume all NUMA nodes are identical, so only check node 0.
+ */
+ if (!cpumask_equal(sched_domains_numa_masks[0][0], tl[i-1].mask(0)))
+ tl[i++] = (struct sched_domain_topology_level){
+ .mask = sd_numa_mask,
+ .numa_level = 0,
+ SD_INIT_NAME(NODE)
+ };
So what you've forgotten to mention is that for those systems where the
LLC == NODE this now superfluous level gets removed by the degenerate
code. Have you verified that does the right thing?
Let me check with that one and get back.
Actually, it is not removed by the degenerate code. That is what this logic
is for. It checks for LCC == NODE or DIE == NODE before setting up the NODE
sched level. I can update the comment. This has also been tested on system
w/ LLC == NODE.
Why does the degenerate code fail to remove things?
Sorry for confusion. Actually, the degenerate code does remove the duplicate
NODE sched-domain.
The logic above is taking a different approach. Instead of depending on the
degenerate code during cpu_attach_domain() at a later time, it would exclude the
NODE sched-domain during sched_init_numa(). The difference is, without
!cpumask_equal(), now the MC sched-domain would have the SD_PREFER_SIBLING flag
set by the degenerate code since the flag got transferred down from the NODE to
MC sched-domain. Would this be the preferred behavior for MC sched-domain?
Regards,
Suravee