Re: [RFC PATCH 1/2] sched: Rate limit migrations to 1 per 2ms per task

From: Mathieu Desnoyers
Date: Wed Sep 13 2023 - 11:45:31 EST


On 9/10/23 03:03, Chen Yu wrote:
On 2023-09-06 at 09:57:04 -0400, Mathieu Desnoyers wrote:
[...]

I suspect we could try something like this then:

When a cpu enters idle state, it could grab a sched_clock() timestamp
and store it into this_rq()->enter_idle_time. Then, when it exits
idle and reenters idle again, it could save rq->enter_idle_time to
rq->prev_enter_idle_time, and sample enter_idle_time again.

When considering the CPU as a target for task migration, if it is
idle but the delta between sched_clock_cpu(cpu_of(rq)) and that
prev_enter_idle_time is below a threshold (e.g. a few ms), this means
the CPU got out of idle and went back to idle pretty quickly, which
means it's not a good target for pulling tasks for a short while.


Do you mean inhit the newidle balance? Currently the newidle balance
checks if the average idle duration of that rq is below the total cost
to do a load balance:
this_rq->avg_idle < sd->max_newidle_lb_cost

Not quite but..


I'll try something along these lines and see how it goes.

anyway this approach did not work based on my testing.



Consider the sleep time looks like a good idea! What you suggests that
inspires me that, maybe we could consider the task's sleep duration,
and decide whether to migrate it or not in the next wakeup.

Say, if a task p sleeps and woken up quickly, can we reserve its previous
CPU as idle for a short while? So other tasks can not choose p's previous
CPU during their wakeup. A short while later, when p is woken up, it finds
that its previous CPU is still idle and chooses that.

I created a draft patch based on that, and it shows some improvements on
a 224 CPUs system. I'll post the draft patch and Cc you.

I think your approach is very promising, let's keep digging into that direction.

Thanks,

Mathieu


thanks,
Chenyu

--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com