Re: [PATCH 3/2] sched/deadline: Use deadline instead of period when calculating overflow
From: Steven Rostedt
Date: Tue Feb 14 2017 - 19:14:36 EST
On Tue, 14 Feb 2017 23:49:26 +0100
luca abeni <luca.abeni@xxxxxxxxxxxxxxx> wrote:
> Hi Steven,
>
> On Tue, 14 Feb 2017 14:28:48 -0500
> "Steven Rostedt (VMware)" <rostedt@xxxxxxxxxxx> wrote:
>
> > I was testing Daniel's changes with his test case, and tweaked it a
> > little. Instead of having the runtime equal to the deadline, I
> > increased the deadline ten fold.
> >
> > Daniel's test case had:
> >
> > attr.sched_runtime = 2 * 1000 * 1000; /* 2 ms
> > */ attr.sched_deadline = 2 * 1000 * 1000; /* 2 ms */
> > attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */
> >
> > To make it more interesting, I changed it to:
> >
> > attr.sched_runtime = 2 * 1000 * 1000; /* 2
> > ms */ attr.sched_deadline = 20 * 1000 * 1000; /* 20 ms
> > */ attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */
> >
> >
> > The results were rather surprising. The behavior that Daniel's patch
> > was fixing came back. The task started using much more than .1% of the
> > CPU. More like 20%.
> >
> > Looking into this I found that it was due to the dl_entity_overflow()
> > constantly returning true. That's because it uses the relative period
> > against relative runtime vs the absolute deadline against absolute
> > runtime.
> >
> > runtime / (deadline - t) > dl_runtime / dl_period
>
> I agree that there is an inconsistency here (again, using equations
> from the "period=deadline" case with a relative deadline different from
> period).
>
> I am not sure about the correct fix (wouldn't
> "runtime / (deadline - t) > dl_runtime / dl_deadline" allow the task to
> use a fraction of CPU time equal to dl_runtime / dl_deadline?)
>
> The current code is clearly wrong (as shown by Daniel), but I do not
> understand how the current check can allow the task to consume more
> than dl_runtime / dl_period... I need some more time to think about
> this issue.
>
This is in dl_entity_overflow() which is called by update_dl_entity()
which has this:
if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
dl_se->runtime = pi_se->dl_runtime;
}
The comments in this code state:
* The policy here is that we update the deadline of the entity only if:
* - the current deadline is in the past,
* - using the remaining runtime with the current deadline would make
* the entity exceed its bandwidth.
That second comment is saying that when this task woke up, if the
percentage left to run will exceed its bandwidth with the rest of the
system then reset its deadline and its runtime.
What happens in the current logic, is that overflow() check says, when
the deadline is much smaller than the period, "yeah, we're going to
exceed our percentage!" so give us more, even though it wont exceed its
percentage if we compared runtime with deadline.
The relative-runtime / relative-period is a tiny percentage, which does
not reflect the percentage that the task is allowed to have before the
deadline is hit. The tasks bandwidth should be calculated by the
relative-runtime / relative-deadline, as runtime <= deadline <= period,
and the runtime should happen within the deadline.
When the task wakes up, it currently looks at how much time is left
absolute-deadline - t, and compares it to the amount of runtime left.
The percentage allowed should still be compared with the percentage
between relative-runtime and relative-deadline. The relative-period or
even absolute-period, should have no influence in this decision.
-- Steve