Re: [PATCH 10/24] sched/uclamg: Handle delayed dequeue

From: Hongyan Xia
Date: Thu Aug 29 2024 - 11:42:40 EST


On 22/08/2024 15:58, Vincent Guittot wrote:
On Thu, 22 Aug 2024 at 14:10, Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:

On Thu, 22 Aug 2024 at 14:08, Luis Machado <luis.machado@xxxxxxx> wrote:

Vincent,

On 8/22/24 11:28, Luis Machado wrote:
On 8/22/24 10:53, Vincent Guittot wrote:
On Thu, 22 Aug 2024 at 11:22, Luis Machado <luis.machado@xxxxxxx> wrote:

On 8/22/24 09:19, Vincent Guittot wrote:
Hi,

On Wed, 21 Aug 2024 at 15:34, Hongyan Xia <hongyan.xia2@xxxxxxx> wrote:

Hi Peter,

Sorry for bombarding this thread in the last couple of days. I'm seeing
several issues in the latest tip/sched/core after these patches landed.

What I'm now seeing seems to be an unbalanced call of util_est. First, I applied

I also see a remaining util_est for idle rq because of an unbalance
call of util_est_enqueue|dequeue


I can confirm issues with the utilization values and frequencies being driven
seemingly incorrectly, in particular for little cores.

What I'm seeing with the stock series is high utilization values for some tasks
and little cores having their frequencies maxed out for extended periods of
time. Sometimes for 5+ or 10+ seconds, which is excessive as the cores are mostly
idle. But whenever certain tasks get scheduled there, they have a very high util
level and so the frequency is kept at max.

As a consequence this drives up power usage.

I gave Hongyan's draft fix a try and observed a much more reasonable behavior for
the util numbers and frequencies being used for the little cores. With his fix,
I can also see lower energy use for my specific benchmark.

The main problem is that the util_est of a delayed dequeued task
remains on the rq and keeps the rq utilization high and as a result
the frequency higher than needed.

The below seems to works for me and keep sync the enqueue/dequeue of
uti_test with the enqueue/dequeue of the task as if de dequeue was not
delayed

Another interest is that we will not try to migrate a delayed dequeue
sleeping task that doesn't actually impact the current load of the cpu
and as a result will not help in the load balance. I haven't yet fully
checked what would happen with hotplug

Thanks. Those are good points. Let me go and try your patch.

I gave your fix a try, but it seems to make things worse. It is comparable
to the behavior we had before Peter added the uclamp imbalance fix, so I
believe there is something incorrect there.

we need to filter case where task are enqueued/dequeued several
consecutive times. That's what I'm look now

I just realize before that It's not only util_est but the h_nr_running
that keeps delayed tasks as well so all stats of the rq are biased:
h_nr_running, util_est, runnable avg and load_avg.

After staring at the code even more, I think the situation is worse.

First thing is that uclamp might also want to be part of these stats (h_nr_running, util_est, runnable_avg, load_avg) that do not follow delayed dequeue which needs to be specially handled in the same way. The current way of handling uclamp in core.c misses the frequency update, like I commented before.

Second, there is also an out-of-sync issue in update_load_avg(). We only update the task-level se in delayed dequeue and requeue, but we return early and the upper levels are completely skipped, as if the delayed task is still on rq. This de-sync is wrong.