Re: [PATCH 0/7] sched/deadline: fix cpusets bandwidth accounting
From: Mathieu Poirier
Date: Fri Aug 25 2017 - 15:53:50 EST
On 25 August 2017 at 03:52, Luca Abeni <luca.abeni@xxxxxxxxxxxxxxx> wrote:
> On Fri, 25 Aug 2017 08:02:43 +0200
> luca abeni <luca.abeni@xxxxxxxxxxxxxxx> wrote:
> [...]
>> > The above demonstrate that even if we have two CPUsets new task belong
>> > to the "default" CPUset and as such can use all the available CPUs.
>>
>> I still have a doubt (probably showing all my ignorance about
>> CPUsets :)... In this situation, we have 3 CPUsets: "default",
>> set1, and set2... Is everyone of these CPUsets associated to a
>> root domain (so, we have 3 root domains)? Or only set1 and set2 are
>> associated to a root domain?
>
> Ok, after reading (and hopefully understanding better :) the code, I
> think this question was kind of silly... There are only 2 root domains,
> corresponding to set1 and set2 (right?).
Correct - although there is a default CPUset there isn't a default root domain.
>
> [...]
>
>> > So above we'd run the acceptance test on root
>> > domain A and B before promoting the task. Of course we'd also have to
>> > add the utilisation of that task to both root domain. Although simple
>> > it goes at the core of the DL scheduler and touches pretty much every
>> > aspect of it, something I'm reluctant to embark on.
>>
>> I see... So, the "default" CPUset does not have any root domain
>> associated to it? If it had, we could just subtract the maximum
>> utilizations of set1 and set2 to it when creating the root domains of
>> set1 and set2.
> ...
> So, this idea of mine had no sense.
>
> I think the correct solution is what you implemented in your patchset
> (if I understand it correctly).
>
> If we want to have task spanning multiple root domains, many more
> changes in the code are needed... I am wondering if it would make more
> sense to track utilizations per runqueue (instead of per root domain):
> - when a task tries to become SCHED_DEADLINE, we count how many CPUs are
> in its affinity mask. Let's call "n" this number
> - then, we sum u / n (where "u" is the task's utilization) to the
> utilization of every runqueue that is in its affinity mask, and we
> check if all the sums are below the schedulability bound
>
> For tasks spanning one single root domain, this should be equivalent to
> the current admission test. Moreover, this check should ensure that no
> root domain can be ever overloaded (even if tasks span multiple
> domains).
This is an idea worth exploring.
> But I do not know the locking implications for this idea... I suspect
> it will not scale :(
Right, scaling could be a problem - we'd have to prototype it and see
how bad things get. We _may_ be able to figure something out with RCU
trickery.
As I mention in a previous email I toyed with the idea of extending
the DL code to support more than one root domain. Maybe it is time to
go back to it, finish the admission test and publish just that part...
At least we would have code to comment on.
Regardless of the avenue we choose to go with I think we could use my
current solution as a stepping stone while we figure out what we
really want to do. At least it would be an improvement on the current
situation.
>
>
>
> Luca