Re: [PATCH] sched/fair: Make PELT signal more accurate

From: Joel Fernandes
Date: Wed Aug 09 2017 - 13:51:35 EST


Hi Vincent,

On Wed, Aug 9, 2017 at 3:23 AM, Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
<snip>
>>
>> Yes this is true, however since I'm using the 'delta' instead of
>> period_contrib, its only does the update every 128us, however if
>> several updates fall within a 128us boundary then those will be rate
>> limited. So say we have a flood of updates, then the updates have to
>> be spaced every 128us to reach the maximum number of division, I don't
>> know whether this is a likely situation or would happen very often? I
>> am planning to run some benchmarks and check that there is no
>> regression as well as Peter mentioned about the performance aspect.
>>
>>>> In order to compare the signals with/without the patch I created a synthetic
>>>> test (20ms runtime, 100ms period) and analyzed the signals and created a report
>>>> on the analysis data/plots both with and without the fix:
>>>> http://www.linuxinternals.org/misc/pelt-error.pdf
>>>
>>> The glitch described in page 2 shows a decrease of the util_avg which
>>> is not linked to accuracy of the calculation but due to the use of the
>>> wrong range when computing util_avg.
>>
>> Yes, and I corrected the graphs this time to show what its like after
>> your patch and confirm that there is STILL a glitch. You are right
>> that there isn't a reduction after your patch, however in my updated
>> graphs there is a glitch and its not a downward peak but a stall in
>> the update, the error is still quite high and can be as high as the
>> absolute 2% error, in my update graphs I show an example where its ~
>> 1.8% (18 / 1024).
>>
>> Could you please take a look at my updated document? I have included
>> new graph and traces there and color coded them so its easy to
>> correlate the trace lines to the error in the graph: Here's the
>> updated new link:
>> https://github.com/joelagnel/joelagnel.github.io/blob/master/misc/pelt-error-rev2.pdf
>
> I see strange behavior in your rev2 document:
> At timestamp 9.235635, we have util_avg=199 acc_util_avg=199
> util_err=0. Everything looks fine but I don't this point on the graph
> Then, at 9.235636 (which is the red colored faulty trace), we have
> util_avg=182 acc_util_avg=200 util_err=18.
> Firstly, this means that util_avg has been updated (199 -> 182) so the
> error is not a problem of util_avg not been updated often enough :-)
> Then, util_avg decreases (199 -> 182) whereas it must increase because
> the task is still running. This should not happen and this is exactly
> what commit 625ed2bf049d should fix. So either the patch is not
> applied or it doesn't fix completely the problem.

I think you are looking at wrong trace lines. The graph is generated
with for rq util only (cfs_rq == 1), so the lines in the traces you
should look at are the ones with cfs_rq= 1. Only cfs_rq==1 lines were
used to generate the graphs.

In this you will see rq util_avg change as follows: 165 -> 182 -> 182
(missed an update causing error). This is also reflected in the graph
in the graph where you see the flat green line.

>
> That would be interesting to also display the last_update_time of sched_avg
>
>>
>>> commit 625ed2bf049d "sched/cfs: Make util/load_avg more stable" fixes
>>> this glitch.
>>> And the lower peak value in page 3 is probably linked to the inaccuracy
>>
>> This is not true. The reduction in peak in my tests which happen even
>> after your patch is because of the dequeue that happens just before
>> the period boundary is hit. Could you please take a look at the
>> updated document in the link above? In there I show in the second
>> example with a trace that corresponds the reduction in peak during the
>> dequeue and is because of the delay in update. These errors go away
>> with my patch.
>
> There is the same strange behavior there:
> When the reduction in peak happens, the util_avg is updated whereas
> your concerns is that util_avg is not update often enough.
> At timestamp 10.656683, we have util_avg=389 acc_util_avg=389 util_err=0
> At timestamp 10.657420, we have util_avg=396 acc_util_avg=396
> util_err=0. I don't see this point on the graph
> At timestamp 10.657422, we have util_avg=389 acc_util_avg=399
> util_err=10. This is the colored faulty trace but util_avg has been
> updated from 369 to 389

Yeah, same thing here, you should look at the lines with cfs_rq == 1.
The util changes as: 363 -> 376 -> 389 -> 389 (missed update).


thanks,

-Joel


>
> Regards,
> Vincent
>
>>
>>> I agree that there is an inaccuracy (the max absolute value of 22) but
>>> that's in favor of less overhead. Have you seen wrong behavior because
>>> of this inaccuracy ?
>>
>> I haven't tried to nail this to a wrong behavior however since other
>> patches have been posted to fix inaccuracy and I do see we reach the
>> theoretical maximum error on quite a few occassions, I think its
>> justifiable. Also the overhead is minimal if updates aren't happening
>> several times in a window, and at 128us interval, and the few times
>> that the update does happen, the division is performed only during
>> those times. So incases where it does fix the error, it does so with
>> minimal overhead. I do agree with the overhead point and I'm planning
>> to do more tests with hackbench to confirm overhead is minimal. I'll
>> post some updates about it soon.
>>
>> Thanks!
>>
>> -Joel
>>
>>
>>>
>>>>
>>>> With the patch, the error in the signal is significantly reduced, and is
>>>> non-existent beyond a small negligible amount.
>>>>
>>>> Cc: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
>>>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>>>> Cc: Juri Lelli <juri.lelli@xxxxxxx>
>>>> Cc: Brendan Jackman <brendan.jackman@xxxxxxx>
>>>> Cc: Dietmar Eggeman <dietmar.eggemann@xxxxxxx>
>>>> Signed-off-by: Joel Fernandes <joelaf@xxxxxxxxxx>
>>>> ---
>>>> kernel/sched/fair.c | 8 ++++++--
>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 4f1825d60937..1347643737f3 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -2882,6 +2882,7 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>>> unsigned long weight, int running, struct cfs_rq *cfs_rq)
>>>> {
>>>> u64 delta;
>>>> + int periods;
>>>>
>>>> delta = now - sa->last_update_time;
>>>> /*
>>>> @@ -2908,9 +2909,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>>> * accrues by two steps:
>>>> *
>>>> * Step 1: accumulate *_sum since last_update_time. If we haven't
>>>> - * crossed period boundaries, finish.
>>>> + * crossed period boundaries and the time since last update is small
>>>> + * enough, we're done.
>>>> */
>>>> - if (!accumulate_sum(delta, cpu, sa, weight, running, cfs_rq))
>>>> + periods = accumulate_sum(delta, cpu, sa, weight, running, cfs_rq);
>>>> +
>>>> + if (!periods && delta < 128)
>>>> return 0;
>>>>
>>>> /*
>>>> --
>>>> 2.14.0.rc1.383.gd1ce394fe2-goog
>>>>