Re: [RFC v4 0/6] CPU reclaiming for SCHED_DEADLINE

From: Luca Abeni
Date: Wed Jan 04 2017 - 13:40:01 EST


2017-01-04 19:00 GMT+01:00, Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>:
[...]
>>>>> Some tasks start to use more CPU time, while others seems to use less
>>>>> CPU than it was reserved for them. See the task 14926, it is using
>>>>> only 23.8 % of the CPU, which is less than its 10/30 reservation.
>>>>
>>>> What happened here is that some runqueues have an active utilisation
>>>> larger than 0.95. So, GRUB is decreasing the amount of time received by
>>>> the tasks on those runqueues to consume less than 95%... This is the
>>>> reason for the effect you noticed below:
>>>
>>> I see. But, AFAIK, the Linux's sched deadline measures the load
>>> globally, not locally. So, it is not a problem having a load > than 95%
>>> in the local queue if the global queue is < 95%.
>>>
>>> Am I missing something?
>>
>> The version of GRUB reclaiming implemented in my patches tracks a
>> per-runqueue "active utilization", and uses it for reclaiming.
>
> I _think_ that this might be (one of) the source(s) of the problem...
I agree that this can cause some problems, but I am not sure if it
justifies the huge difference in utilisations you observed

> Just exercising...
>
> For example, with my taskset, with a hypothetical perfect balance of the
> whole runqueue, one possible scenario is:
>
> CPU 0 1 2 3
> # TASKS 3 3 3 2
>
> In this case, CPUs 0 1 2 are with 100% of local utilization. Thus, the
> current task on these CPUs will have their runtime decreased by GRUB.
> Meanwhile, the luck tasks in the CPU 3 would use an additional time that
> they "globally" do not have - because the system, globally, has a load
> higher than the 66.6...% of the local runqueue. Actually, part of the
> time decreased from tasks on [0-2] are being used by the tasks on 3,
> until the next migration of any task, which will change the luck
> tasks... but without any guaranty that all tasks will be the luck one on
> every activation, causing the problem.
>
> Does it make sense?

Yes; but my impression is that gEDF will migrate tasks so that the
distribution of the reclaimed CPU bandwidth is almost uniform...
Instead, you saw huge differences in the utilisations (and I do not
think that "compressing" the utilisations from 100% to 95% can
decrease the utilisation of a task from 33% to 25% / 26%... :)

I suspect there is something more going on here (might be some bug in
one of my patches). I am trying to better understand what happened.

> If it does, this let me think that only with the global track of
> utilization we will achieve the correct result... but I may be missing
> something... :-).

Of course tracking the global active utilisation can be a solution,
but I also want to better understand what is wrong with the current
approach.

Thanks,
Luca