Re: [PATCH] cpu-topology: warn if NUMA configurations conflicts with lower layer

From: Valentin Schneider
Date: Sat Jan 11 2020 - 15:57:04 EST


On 09/01/2020 12:58, Zengtao (B) wrote:
>> IIUC, the problem is that virt can set up a broken topology in some
>> cases where MPIDR doesn't line up correctly with the defined NUMA
>> nodes.
>>
>> We could argue that it is a qemu/virt problem, but it would be nice if
>> we could at least detect it. The proposed patch isn't really the right
>> solution as it warns on some valid topologies as Sudeep already pointed
>> out.
>>
>> It sounds more like we need a mask subset check in the sched_domain
>> building code, if there isn't already one?
>
> Currently no, it's a bit complex to do the check in the sched_domain building code,
> I need to take a think of that.
> Suggestion welcomed.
>

Doing a search on the sched_domain spans themselves should look something like
the completely untested:

---8<---
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 6ec1e595b1d4..96128d12ec23 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1879,6 +1879,43 @@ static struct sched_domain *build_sched_domain(struct sched_domain_topology_leve
return sd;
}

+/* Ensure topology masks are sane; non-NUMA spans shouldn't overlap */
+static int validate_topology_spans(const struct cpumask *cpu_map)
+{
+ struct sched_domain_topology_level *tl;
+ int i, j;
+
+ for_each_sd_topology(tl) {
+ /* NUMA levels are allowed to overlap */
+ if (tl->flags & SDTL_OVERLAP)
+ break;
+
+ /*
+ * Non-NUMA levels cannot partially overlap - they must be
+ * either equal or wholly disjoint. Otherwise we can end up
+ * breaking the sched_group lists - i.e. a later get_group()
+ * pass breaks the linking done for an earlier span.
+ */
+ for_each_cpu(i, cpu_map) {
+ for_each_cpu(j, cpu_map) {
+ if (i == j)
+ continue;
+ /*
+ * We should 'and' all those masks with 'cpu_map'
+ * to exactly match the topology we're about to
+ * build, but that can only remove CPUs, which
+ * only lessens our ability to detect overlaps
+ */
+ if (!cpumask_equal(tl->mask(i), tl->mask(j)) &&
+ cpumask_intersects(tl->mask(i), tl->mask(j)))
+ return -1;
+ }
+ }
+ }
+
+ return 0;
+}
+
/*
* Find the sched_domain_topology_level where all CPU capacities are visible
* for all CPUs.
@@ -1953,7 +1990,8 @@ build_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *att
struct sched_domain_topology_level *tl_asym;
bool has_asym = false;

- if (WARN_ON(cpumask_empty(cpu_map)))
+ if (WARN_ON(cpumask_empty(cpu_map)) ||
+ WARN_ON(validate_topology_spans(cpu_map)))
goto error;

alloc_state = __visit_domain_allocation_hell(&d, cpu_map);
--->8---

Alternatively the assertion on the sched_group linking I suggested earlier
in the thread should suffice, since this should trigger whenever we have
overlapping non-NUMA sched domains.

Since you have a setup where you can reproduce the issue, could please give
either (ideally both!) a try? Thanks.

> Thanks
> Zengtao
>
>>
>> Morten