Re: [PATCH v4 2/9] sched/topology: Extract "imb_numa_nr" calculation into a separate helper
From: K Prateek Nayak
Date: Sun Mar 15 2026 - 23:41:22 EST
Hello Dietmar,
On 3/16/2026 5:48 AM, Dietmar Eggemann wrote:
>> + /*
>> + * For a single LLC per node, allow an
>> + * imbalance up to 12.5% of the node. This is
>> + * arbitrary cutoff based two factors -- SMT and
>> + * memory channels. For SMT-2, the intent is to
>> + * avoid premature sharing of HT resources but
>> + * SMT-4 or SMT-8 *may* benefit from a different
>> + * cutoff. For memory channels, this is a very
>> + * rough estimate of how many channels may be
>> + * active and is based on recent CPUs with
>> + * many cores.
>> + *
>> + * For multiple LLCs, allow an imbalance
>> + * until multiple tasks would share an LLC
>> + * on one node while LLCs on another node
>> + * remain idle. This assumes that there are
>> + * enough logical CPUs per LLC to avoid SMT
>> + * factors and that there is a correlation
>> + * between LLCs and memory channels.
>> + */
>> + nr_llcs = sd_llc->parent->span_weight / sd_llc->span_weight;
>> + if (nr_llcs == 1)
>> + imb = sd_llc->parent->span_weight >> 3;
>> + else
>> + imb = nr_llcs;
>> +
>> + imb = max(1U, imb);
>> + sd_llc->parent->imb_numa_nr = imb;
>
> Here you set imb_numa_nr e.g. for PKG ...
Ack! That is indeed a redundant assign since it gets reassigned
in the bottom loop. For this commit, we have kept it 1:1 with the
loop that existed before in build_sched_domains().
>
>> +
>> + /*
>> + * Set span based on the first NUMA domain.
>> + *
>> + * NUMA systems always add a NODE domain before
>> + * iterating the NUMA domains. Since this is before
>> + * degeneration, start from sd_llc's parent's
>> + * parent which is the lowest an SD_NUMA domain can
>> + * be relative to sd_llc.
>> + */
>> + parent = sd_llc->parent->parent;
>> + while (parent && !(parent->flags & SD_NUMA))
>> + parent = parent->parent;
>> +
>> + imb_span = parent ? parent->span_weight : sd_llc->parent->span_weight;
>> +
>> + /* Update the upper remainder of the topology */
>> + parent = sd_llc->parent;
>> + while (parent) {
>> + int factor = max(1U, (parent->span_weight / imb_span));
>> +
>> + parent->imb_numa_nr = imb * factor;
>
> ... and here again.
>
> Shouldn't we only set it for 'if (parent->flags & SD_NUMA)'?
>
> Not sure if there are case in which PKG would persist in
>
> ... -> MC -> PKG -> NODE -> NUMA -> ... ?
>
> Although access to sd->imb_numa_nr seems to be guarded by sd->flags &
> SD_NUMA.
Indeed! "imb_numa_nr" only makes sense when looking at NUMA domains
and having it assigned to 1 for lower domains is harmless
(but wasteful indeed). I'm 99% sure we can simply do:
(Only build tested)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 43150591914b..e9068a809dbc 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -2623,9 +2623,6 @@ static void adjust_numa_imbalance(struct sched_domain *sd_llc)
else
imb = nr_llcs;
- imb = max(1U, imb);
- sd_llc->parent->imb_numa_nr = imb;
-
/*
* Set span based on the first NUMA domain.
*
@@ -2639,10 +2636,14 @@ static void adjust_numa_imbalance(struct sched_domain *sd_llc)
while (parent && !(parent->flags & SD_NUMA))
parent = parent->parent;
- imb_span = parent ? parent->span_weight : sd_llc->parent->span_weight;
+ /* No NUMA domain to adjust imbalance for! */
+ if (!parent)
+ return;
+
+ imb = max(1U, imb);
+ imb_span = parent->span_weight;
/* Update the upper remainder of the topology */
- parent = sd_llc->parent;
while (parent) {
int factor = max(1U, (parent->span_weight / imb_span));
---
If we have NUMA domains, we definitely have NODE and NODE sets neither
SD_SHARE_LLC, nor SD_NUMA so likely sd->parent is PKG / NODE domain and
NUMA has to start at sd->parent->parent and it has to break at the first
SD_NUMA domains.
If it doesn't exist, we don't have any NUMA domains and nothing to worry
about, and if we do, the final loop will adjust the NUMA imbalance.
Thoughts? Again, this commit was kept 1:1 with the previous loop but we
can always improve :-)
>
>> + parent = parent->parent;
>> + }
>> +}
>> +
> [...]
--
Thanks and Regards,
Prateek