Re: [External] Re: [PATCH 1/2] sched/numa: Stop an exhastive search if an idle core is found

From: Hao Jia
Date: Tue Oct 25 2022 - 22:34:55 EST




On 2022/10/25 Mel Gorman wrote:
On Tue, Oct 25, 2022 at 07:10:22PM +0800, Hao Jia wrote:
On 2022/10/25 Mel Gorman wrote:
On Tue, Oct 25, 2022 at 11:16:29AM +0800, Hao Jia wrote:
Remove the change in the first hunk and call break in the second hunk
after updating ns->idle_cpu.


Yes, thanks for your review.
If I understand correctly, some things might look like this.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e4a0b8bd941c..dfcb620bfe50 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1792,7 +1792,7 @@ static void update_numa_stats(struct task_numa_env
*env,
ns->nr_running += rq->cfs.h_nr_running;
ns->compute_capacity += capacity_of(cpu);

- if (find_idle && !rq->nr_running && idle_cpu(cpu)) {
+ if (find_idle && idle_core < 0 && !rq->nr_running &&
idle_cpu(cpu)) {
if (READ_ONCE(rq->numa_migrate_on) ||
!cpumask_test_cpu(cpu, env->p->cpus_ptr))
continue;


I meant more like the below but today I wondered why did I not do this in
the first place? The answer is because it's wrong and broken in concept.

The full loop is needed to calculate approximate NUMA stats at a
point in time. For example, the src and dst nr_running is needed by
task_numa_find_cpu. The search for an idle CPU or core in update_numa_stats
is simply taking advantage of the fact we are scanning anyway to keep
track of an idle CPU or core to avoid a second search as per ff7db0bf24db
("sched/numa: Prefer using an idle CPU as a migration target instead of
comparing tasks")

The patch I had in mind is below but that said, for both your version and
my initial suggestion

Naked-by: Mel Gorman <mgorman@xxxxxxx>

For the record, this is what I was suggesting initially because it's more
efficient but it's wrong, don't do it.


Thanks for the detailed explanation, maybe my commit message misled you.


Yes, I did end up confusing myself. The title and changelog referred to
stopping a search which made me think of terms of "this whole loop can
terminate early" which it can't but it *can* stop checking for a new idle
core. If an idle core has been found, it follows that an idle CPU has also
been found. While numa_idle_core checks this explicitly, your patch avoids
an unnecessary cpumask_test_cpu so it has value.


Thank you for your review, I will change the commit message and send patch v2.


Yes, we can't stop the whole loop of scanning the CPU because we have a lot
of NUMA information to count.

But we can stop looking for the next idle core or idle cpu after finding an
idle core.

So, please review the previous code.


You're right and sorry for the noise.

Acked-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>

Thanks!


Thanks,
Hao