Re: [PATCH] cpu-topology: warn if NUMA configurations conflicts with lower layer

From: Dietmar Eggemann
Date: Mon Jan 13 2020 - 09:49:13 EST



On 11.01.20 21:56, Valentin Schneider wrote:
> On 09/01/2020 12:58, Zengtao (B) wrote:
>>> IIUC, the problem is that virt can set up a broken topology in some
>>> cases where MPIDR doesn't line up correctly with the defined NUMA
>>> nodes.
>>>
>>> We could argue that it is a qemu/virt problem, but it would be nice if
>>> we could at least detect it. The proposed patch isn't really the right
>>> solution as it warns on some valid topologies as Sudeep already pointed
>>> out.
>>>
>>> It sounds more like we need a mask subset check in the sched_domain
>>> building code, if there isn't already one?
>>
>> Currently no, it's a bit complex to do the check in the sched_domain building code,
>> I need to take a think of that.
>> Suggestion welcomed.
>>
>
> Doing a search on the sched_domain spans themselves should look something like
> the completely untested:

[...]

LGTM. This code detects the issue in cpu_coregroup_mask(), which is the
the cpumask function of the sched domain MC level struct
sched_domain_topology_level of ARM64's (and other archs)
default_topology[].
I wonder how x86 copes with such a config error?
Maybe they do it inside their cpu_coregroup_mask()?


We could move validate_topology_spans() into the existing

for_each_cpu(i, cpu_map)
for_each_sd_topology(tl)

loop in build_sched_domains() saving some code?

---8<---

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index e6ff114e53f2..5f2764433a3d 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1880,37 +1880,34 @@ static struct sched_domain *build_sched_domain(struct sched_domain_topology_leve
}

/* Ensure topology masks are sane; non-NUMA spans shouldn't overlap */
-static int validate_topology_spans(const struct cpumask *cpu_map)
+static int validate_topology_spans(struct sched_domain_topology_level *tl,
+ const struct cpumask *cpu_map, int cpu)
{
- struct sched_domain_topology_level *tl;
- int i, j;
+ const struct cpumask* mask = tl->mask(cpu);
+ int i;

- for_each_sd_topology(tl) {
- /* NUMA levels are allowed to overlap */
- if (tl->flags & SDTL_OVERLAP)
- break;
+ /* NUMA levels are allowed to overlap */
+ if (tl->flags & SDTL_OVERLAP)
+ return 0;

+ /*
+ * Non-NUMA levels cannot partially overlap - they must be
+ * either equal or wholly disjoint. Otherwise we can end up
+ * breaking the sched_group lists - i.e. a later get_group()
+ * pass breaks the linking done for an earlier span.
+ */
+ for_each_cpu(i, cpu_map) {
+ if (i == cpu)
+ continue;
/*
- * Non-NUMA levels cannot partially overlap - they must be
- * either equal or wholly disjoint. Otherwise we can end up
- * breaking the sched_group lists - i.e. a later get_group()
- * pass breaks the linking done for an earlier span.
+ * We should 'and' all those masks with 'cpu_map'
+ * to exactly match the topology we're about to
+ * build, but that can only remove CPUs, which
+ * only lessens our ability to detect overlaps
*/
- for_each_cpu(i, cpu_map) {
- for_each_cpu(j, cpu_map) {
- if (i == j)
- continue;
- /*
- * We should 'and' all those masks with 'cpu_map'
- * to exactly match the topology we're about to
- * build, but that can only remove CPUs, which
- * only lessens our ability to detect overlaps
- */
- if (!cpumask_equal(tl->mask(i), tl->mask(j)) &&
- cpumask_intersects(tl->mask(i), tl->mask(j)))
- return -1;
- }
- }
+ if (!cpumask_equal(mask, tl->mask(i)) &&
+ cpumask_intersects(mask, tl->mask(i)))
+ return -1;
}

return 0;
@@ -1990,8 +1987,7 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
struct sched_domain_topology_level *tl_asym;
bool has_asym = false;

- if (WARN_ON(cpumask_empty(cpu_map)) ||
- WARN_ON(validate_topology_spans(cpu_map)))
+ if (WARN_ON(cpumask_empty(cpu_map)))
goto error;

alloc_state = __visit_domain_allocation_hell(&d, cpu_map);
@@ -2013,6 +2009,9 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
has_asym = true;
}

+ if (WARN_ON(validate_topology_spans(tl, cpu_map, i)))
+ goto error;
+
sd = build_sched_domain(tl, cpu_map, attr, sd, dflags, i);

if (tl == sched_domain_topology)