Re: [PATCH RFC] sched/deadline: Use the revised wakeup rule for suspending constrained dl tasks
From: Daniel Bristot de Oliveira
Date: Thu May 11 2017 - 13:03:50 EST
On 05/04/2017 04:17 PM, Peter Zijlstra wrote:
> On Mon, Apr 24, 2017 at 05:18:35PM +0200, Daniel Bristot de Oliveira wrote:
>> We have been facing some problems with self-suspending constrained
>> deadline tasks. The main reason is that the original CBS was not
>> designed for such sort of tasks.
>>
>> One problem reported by Xunlei Pang takes place when a task
>> suspends, and then is awakened before the deadline, but so close
>> to the deadline that its remaining runtime can cause the task
>> to have an absolute density higher than allowed. In such situation,
>> the original CBS assumes that the task is facing an early activation,
>> and so it replenishes the task and set another deadline, one deadline
>> in the future. This rule works fine for implicit deadline tasks.
>> Moreover, it allows the system to adapt the period of a task in which
>> the external event source suffered from a clock drift.
>>
>> However, this opens the window for bandwidth leakage for constrained
>> deadline tasks. For instance, a task with the following parameters:
>>
>> runtime = 5 ms
>> deadline = 7 ms
>> [density] = 5 / 7 = 0.71
>> period = 1000 ms
>>
>> If the task runs for 1 ms, and then suspends for another 1ms,
>> it will be awakened with the following parameters:
>>
>> remaining runtime = 4
>> laxity = 5
>>
>> presenting a absolute density of 4 / 5 = 0.80.
>>
>> In this case, the original CBS would assume the task had an early
>> wakeup. Then, CBS will reset the runtime, and the absolute deadline will
>> be postponed by one relative deadline, allowing the task to run.
>>
>> The problem is that, if the task runs this pattern forever, it will keep
>> receiving bandwidth, being able to run 1ms every 2ms. Following this
>> behavior, the task would be able to run 500 ms in 1 sec. Thus running
>> more than the 5 ms / 1 sec the admission control allowed it to run.
>>
>> Trying to address the self-suspending case, Luca Abeni, Giuseppe
>> Lipari, and Juri Lelli [1] revisited the CBS in order to deal with
>> self-suspending tasks. In the new approach, rather than
>> replenishing/postponing the absolute deadline, the revised wakeup rule
>> adjusts the remaining runtime, reducing it to fit into the allowed
>> density.
>>
>> A resumed version of the idea is:
>>
>> At a given time t, the maximum absolute density of a task cannot be
>> higher than its relative density, that is:
>>
>> runtime / (deadline - t) <= dl_runtime / dl_deadline
>>
>> Knowing the laxity of a task (deadline - t), it is possible to move
>> it to the other side of the equality, thus enabling to define max
>> remaining runtime a task can use within the absolute deadline, without
>> over-running the allowed density:
>>
>> runtime = (dl_runtime / dl_deadline) * (deadline - t)
>>
>> For instance, in our previous example, the task could still run:
>>
>> runtime = ( 5 / 7 ) * 4
>> runtime = 2.85 ms
>>
>> Without causing damage for other deadline tasks. It is note worth that
>> the laxity cannot be negative because that would cause a negative
>> runtime. Thus, this patch depends on the patch:
>>
>> edf5835 sched/deadline: Throttle a constrained deadline task activated
>> after the deadline
>
> My git tree says that is:
>
> df8eac8cafce ("sched/deadline: Throttle a constrained deadline task activated after the deadline")
Ops, you are right, I was using my own tree, sorry.
>> Which throttles a constrained deadline task activated after the
>> deadline.
>>
>> Finally, it is also possible to use the revised wakeup rule for
>> all other tasks, but that would require some more discussions
>> about pros and cons.
>>
>> Reported-by: Xunlei Pang <xpang@xxxxxxxxxx>
>> Signed-off-by: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
>> Cc: Xunlei Pang <xpang@xxxxxxxxxx>
>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
>> Cc: Juri Lelli <juri.lelli@xxxxxxx>
>> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
>> Cc: Luca Abeni <luca.abeni@xxxxxxxxxxxxxxx>
>> Cc: Tommaso Cucinotta <tommaso.cucinotta@xxxxxxxx>
>> Cc: Romulo Silva de Oliveira <romulo.deoliveira@xxxxxxx>
>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>> ---
>> kernel/sched/deadline.c | 67 +++++++++++++++++++++++++++++++++++++++++++------
>> 1 file changed, 60 insertions(+), 7 deletions(-)
>>
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index a2ce590..71e5bcf 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -484,13 +484,63 @@ static bool dl_entity_overflow(struct sched_dl_entity *dl_se,
>> }
>>
>> /*
>> + * Revised wakeup rule [1]: For self-suspending tasks, rather then
>> + * re-initializing task's runtime and deadline, the revised wakeup
>> + * rule adjusts the task's runtime to avoid the task to overrun its
>> + * density.
>> + *
>> + * Reasoning: a task may overrun the density if:
>> + * runtime / (deadline - t) > dl_runtime / dl_deadline
>
> When reading that, I have the instant question: "why / how ?" I suspect
> the blurb below (at update_dl_entity) has the answer, if so this can use
> a reference thereto.
Yeah, I will connect the two comments.
>> + *
>> + * Therefore, runtime can be adjusted to:
>> + * runtime = (dl_runtime / dl_deadline) * (deadline - t)
>> + *
>> + * In such way that runtime will be equals to the maximum density
>> + * the task can use without breaking any rule.
>> + *
>> + * [1] Luca Abeni, Giuseppe Lipari, and Juri Lelli. 2015. Constant
>> + * bandwidth server revisited. SIGBED Rev. 11, 4 (January 2015), 19-24.
>> + */
>> +static void
>> +update_dl_revised_wakeup(struct sched_dl_entity *dl_se, struct rq *rq)
>> +{
>> + u64 density = div64_u64(dl_se->dl_runtime << 20, dl_se->dl_deadline);
>> + u64 laxity = dl_se->deadline - rq_clock(rq);
>> +
>> + BUG_ON(laxity < 0);
>
> Compiler will make that go away, by virtue of laxity being unsigned.
>
>> +
>> + dl_se->runtime = density * laxity >> 20;
>> +}
>> +
> /*
> * XXX comment that explains what constrained is ? per the definition
> * below that is any task where deadline != period?
> */
Per definition, constrained deadline tasks are those with deadline <=
period.
So, to be strictly correct, I should even rename this function to
something like: dl_is_not_implicit. (implicit deadline means deadline =
period).
(Well, we also have arbitrary deadline, with deadline less than, equals
to or higher than the period... so the name should be something like
dl_neither_implicit_nor_arbitrary but we do not support it...).
So, although slight imprecise, the current name is the more intuitive
one. Any suggestions?
I will write a comment explaining the terms.
>> +static inline bool dl_is_constrained(struct sched_dl_entity *dl_se)
>> +{
>> + return dl_se->dl_deadline < dl_se->dl_period;
>> +}
>> +
>> +/*
>> * When a -deadline entity is queued back on the runqueue, its runtime and
>> * deadline might need updating.
>> *
>> + * Currently, we are using two different CBS rules, 1) the original CBS
>> + * for implicit deadline tasks; 2) the revisited CBS for constrained
>> + * deadline ones. The reason is that the original CBS can cause a constrained
>> + * deadline task to be replenished deadline/period times in a period, in
>> + * the worst case, hence allowing it to run more than runtime/period.
>> + * In order to prevent this misbehave, the revisited CBS is used for
>> + * constrained deadline tasks. In the revisited CBS, in the case of an
>> + * overload, rather than replenishing & postponing the deadline, the
>> + * remaining runtime of a task is reduced to avoid runtime overflow.
>
> We use two difference CBS rules:
>
> 1) the original CBS rule for implicit deadline tasks;
> 2) the revised CBS rule for constrained deadline tasks.
>
> ( and here I have the question, wth is implicit / constrained )
>
> The reason is that the original CBS rul can cause a constrained deadline
> task to be replenished deadline/period times in a period (worst case),
> hence allowing it to run more than runtime/period.
>
> ( the deadline/period thing could use a few words extra; in the example
> above that divides to: 7/1000 = .007 which does not have an integer
> component. You cannot replenish fractional times etc.. )
Actually, the sched deadline uses that 'runtime << 20' to avoid such
imprecision. For example, runtime of 7 ms of a 1000 ms period would
result in:
7 << 20 / 1000
7340032 / 1000 = 7340.032 = 7340
Btw, we are currently using the density in the overflow rule, so it is
using runtime / deadline, which is more pessimistic.
> In order to prevent this, the revised CBS rule reduces the remaining
> runtime (by limiting the max density).
correct!
>
>> + *
>> + * So, for implicit deadline tasks, the policy here is that the runtime &
>> + * deadline of an entity are update if and only if., either:
>
> updated ? IFF
>
>> + * - the current deadline is in the past, or
>> * - using the remaining runtime with the current deadline would make
>> * the entity exceed its bandwidth.
>> + *
>> + * For constrained deadline tasks, the policy here is that the runtime
>> + * is reduced to avoid exceeding its bandwidth if:
>
>> + * - using the remaining runtime with the current deadline would make
>> + * the entity exceed its bandwidth.
>> */
>
> In general the comment on comments is to add whitespace. That greatly
> increases readability.
ack o>
>
>> static void update_dl_entity(struct sched_dl_entity *dl_se,
>> struct sched_dl_entity *pi_se)
>> @@ -500,6 +550,14 @@ static void update_dl_entity(struct sched_dl_entity *dl_se,
>>
>> if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>> dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) {
>> +
>> + if (unlikely(dl_is_constrained(dl_se) &&
>> + !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
>
> alignment is misleading, this is still inside the unlikely(.
ack
>> + !dl_se->dl_boosted)){
>> + update_dl_revised_wakeup(dl_se, rq);
>> + return;
>> + }
>> +
>> dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline;
>> dl_se->runtime = pi_se->dl_runtime;
>> }