Re: [PATCH v3 10/13] sched/fair: Compute task/cpu utilization at wake-up more correctly
From: Wanpeng Li
Date: Thu Aug 18 2016 - 21:43:10 EST
2016-08-18 21:45 GMT+08:00 Morten Rasmussen <morten.rasmussen@xxxxxxx>:
> On Thu, Aug 18, 2016 at 07:46:44PM +0800, Wanpeng Li wrote:
>> 2016-08-18 18:24 GMT+08:00 Morten Rasmussen <morten.rasmussen@xxxxxxx>:
>> > On Thu, Aug 18, 2016 at 09:40:55AM +0100, Morten Rasmussen wrote:
>> >> On Mon, Aug 15, 2016 at 04:42:37PM +0100, Morten Rasmussen wrote:
>> >> > On Mon, Aug 15, 2016 at 04:23:42PM +0200, Peter Zijlstra wrote:
>> >> > > But unlike that function, it doesn't actually use __update_load_avg().
>> >> > > Why not?
>> >> >
>> >> > Fair question :)
>> >> >
>> >> > We currently exploit the fact that the task utilization is _not_ updated
>> >> > in wake-up balancing to make sure we don't under-estimate the capacity
>> >> > requirements for tasks that have slept for a while. If we update it, we
>> >> > loose the non-decayed 'peak' utilization, but I guess we could just
>> >> > store it somewhere when we do the wake-up decay.
>> >> >
>> >> > I thought there was a better reason when I wrote the patch, but I don't
>> >> > recall right now. I will look into it again and see if we can use
>> >> > __update_load_avg() to do a proper update instead of doing things twice.
>> >>
>> >> AFAICT, we should be able to synchronize the task utilization to the
>> >> previous rq utilization using __update_load_avg() as you suggest. The
>> >> patch below is should work as a replacement without any changes to
>> >> subsequent patches. It doesn't solve the under-estimation issue, but I
>> >> have another patch for that.
>> >
>> > And here is a possible solution to the under-estimation issue. The patch
>> > would have to go at the end of this set.
>> >
>> > ---8<---
>> >
>> > From 5bc918995c6c589b833ba1f189a8b92fa22202ae Mon Sep 17 00:00:00 2001
>> > From: Morten Rasmussen <morten.rasmussen@xxxxxxx>
>> > Date: Wed, 17 Aug 2016 15:30:43 +0100
>> > Subject: [PATCH] sched/fair: Track peak per-entity utilization
>> >
>> > When using PELT (per-entity load tracking) utilization to place tasks at
>> > wake-up using the decayed utilization (due to sleep) leads to
>> > under-estimation of true utilization of the task. This could mean
>> > putting the task on a cpu with less available capacity than is actually
>> > needed. This issue can be mitigated by using 'peak' utilization instead
>> > of the decayed utilization for placement decisions, e.g. at task
>> > wake-up.
>> >
>> > The 'peak' utilization metric, util_peak, tracks util_avg when the task
>> > is running and retains its previous value while the task is
>> > blocked/waiting on the rq. It is instantly updated to track util_avg
>> > again as soon as the task running again.
>>
>> Maybe this will lead to disable wake affine due to a spike peak value
>> for a low average load task.
>
> I assume you are referring to using task_util_peak() instead of
> task_util() in wake_cap()?
Yes.
>
> The peak value should never exceed the util_avg accumulated by the task
> last time it ran. So any spike has to be caused by the task accumulating
> more utilization last time it ran. We don't know if it a spike or a more
I see.
> permanent change in behaviour, so we have to guess. So a spike on an
> asymmetric system could cause us to disable wake affine in some
> circumstances (either prev_cpu or waker cpu has to be low compute
> capacity) for the following wake-up.
>
> SMP should be unaffected as we should bail out on the previous
> condition.
Why capacity_orig instead of capacity since it is checked each time
wakeup and maybe rt class/interrupt have already occupied many cpu
utilization.
>
> The counter-example is task with a fairly long busy period and a much
> longer period (cycle). Its util_avg might have decayed away since the
> last activation so it appears very small at wake-up and we end up
> putting it on a low capacity cpu every time even though it keeps the cpu
> busy for a long time every time it wakes up.
Agreed, that's the reason for under-estimation concern.
>
> Did that answer your question?
Yeah, thanks for the clarification.
Regards,
Wanpeng Li