Re: [PATCH 1/2] sched/deadline: Replenishment timer should fire in the next period

From: Steven Rostedt
Date: Fri Feb 10 2017 - 16:48:18 EST


On Fri, 10 Feb 2017 20:48:10 +0100
Daniel Bristot de Oliveira <bristot@xxxxxxxxxx> wrote:

> Currently, the replenishment timer is set to fire at the deadline
> of a task. Although that works for implicit deadline tasks because the
> deadline is equals to the begin of the next period, that is not correct
> for constrained deadline tasks (deadline < period).
>
> For instance:
>
> f.c:
> --------------- %< ---------------
> int main (void)
> {
> for(;;);
> }
> --------------- >% ---------------
>
> # gcc -o f f.c
>
> # trace-cmd record -e sched:sched_switch \
> -e syscalls:sys_exit_sched_setattr \
> chrt -d --sched-runtime 490000000 \
> --sched-deadline 500000000 \
> --sched-period 1000000000 0 ./f
>
> # trace-cmd report | grep "{pid of ./f}"

# trace-cmd report -F '.*:common_pid == 11295'

does the same ;-)

Or better yet, add "-F" (just -F, no extra paremeter. In report, -F
means "filter", in record -F means "follow"), and that will only record
the task that is created.

Also, it may be nice to use --ts-diff option to see results. I ran this
with my own spinning program, and here's the output:

# trace-cmd report --ts-diff
[...]
userspin-1540 [003] 298.725084: (+822) sys_exit_sched_setattr: 0x0
userspin-1540 [003] 299.218199: (+493115) sched_switch: userspin:1540 [4294967295] R ==> rcuc/3:38 [98]
<idle>-0 [003] 299.225111: (+6912) sched_switch: swapper/3:0 [120] R ==> userspin:1540 [4294967295]
userspin-1540 [003] 299.718192: (+493081) sched_switch: userspin:1540 [4294967295] R ==> ktimersoftd/3:39 [98]
<idle>-0 [003] 300.225125: (+506933) sched_switch: swapper/3:0 [120] R ==> userspin:1540 [4294967295]
userspin-1540 [003] 300.717185: (+492060) sched_switch: userspin:1540 [4294967295] R ==> rcuc/3:38 [98]
<idle>-0 [003] 301.225116: (+507931) sched_switch: swapper/3:0 [120] R ==> userspin:1540 [4294967295]
userspin-1540 [003] 301.718177: (+493061) sched_switch: userspin:1540 [4294967295] R ==> rcuc/3:38 [98]
<idle>-0 [003] 302.225120: (+506943) sched_switch: swapper/3:0 [120] R ==> userspin:1540 [4294967295]


>
> After setting parameters, the task is replenished and continue running
> until being throttled.
> f-11295 [003] 13322.113776: sys_exit_sched_setattr: 0x0
>
> The task is throttled after running 492318 ms, as expected.
> f-11295 [003] 13322.606094: sched_switch: f:11295 [-1] R ==> \
> watchdog/3:32 [0]
>
> But then, the task is replenished 500719 ms after the first
> replenishment:
> <idle>-0 [003] 13322.614495: sched_switch: swapper/3:0 [120] R \
> ==> f:11295 [-1]
>
> Running for 490277 ms:
> f-11295 [003] 13323.104772: sched_switch: f:11295 [-1] R ==> \
> swapper/3:0 [120]
>
> Hence, in the first period, the task runs 2 * runtime, and that is a bug.
>
> During the first replenishment, the next deadline is set one period away.
> So the runtime/period starts to be respected. However, as the second
> replenishment took place in the wrong instant, the next replenishment
> will also be held in a wrong instant of time. Rather than occurring in
> the nth period away from the first activation, it is taking place
> in the (nth period - relative deadline).

OK, after you patch, I get this:

userspin-1581 [003] 103.450129: (+716) sys_exit_sched_setattr: 0x0
userspin-1581 [003] 103.943957: (+493828) sched_switch: userspin:1581 [4294967295] R ==> rcuc/3:38 [98]
<idle>-0 [003] 104.450172: (+506215) sched_switch: swapper/3:0 [120] R ==> userspin:1581 [4294967295]
userspin-1581 [003] 104.942349: (+492177) sched_switch: userspin:1581 [4294967295] R ==> watchdog/3:36 [0]
<idle>-0 [003] 105.450164: (+507815) sched_switch: swapper/3:0 [120] R ==> userspin:1581 [4294967295]
userspin-1581 [003] 105.942354: (+492190) sched_switch: userspin:1581 [4294967295] R ==> ktimersoftd/3:39 [98]
<idle>-0 [003] 106.450159: (+507805) sched_switch: swapper/3:0 [120] R ==> userspin:1581 [4294967295]
userspin-1581 [003] 106.942366: (+492207) sched_switch: userspin:1581 [4294967295] R ==> ktimersoftd/3:39 [98]
<idle>-0 [003] 107.450164: (+507798) sched_switch: swapper/3:0 [120] R ==> userspin:1581 [4294967295]

Looks promising. But unrelated, I noticed this with and without your
patch:

userspin-1567 [002] 81.455345: (+1196) sys_exit_sched_setattr: 0x0
userspin-1567 [002] 81.457599: (+2254) sched_switch: chrt:1567 [4294967295] D ==> rcuc/2:29 [98]
<idle>-0 [002] 81.466030: (+8431) sched_switch: swapper/2:0 [120] R ==> chrt:1567 [4294967295]
userspin-1567 [002] 81.466123: (+93) sched_switch: chrt:1567 [4294967295] D ==> rcuc/2:29 [98]
<idle>-0 [002] 81.471862: (+5739) sched_switch: swapper/2:0 [120] R ==> chrt:1567 [4294967295]
userspin-1567 [002] 81.471959: (+97) sched_switch: chrt:1567 [4294967295] D ==> swapper/2:0 [120]
<idle>-0 [002] 81.478285: (+6326) sched_switch: swapper/2:0 [120] R ==> chrt:1567 [4294967295]
userspin-1567 [002] 81.478369: (+84) sched_switch: chrt:1567 [4294967295] D ==> swapper/2:0 [120]
<idle>-0 [002] 81.486664: (+8295) sched_switch: swapper/2:0 [120] R ==> chrt:1567 [4294967295]
userspin-1567 [002] 81.487291: (+627) sched_switch: chrt:1567 [4294967295] D|W ==> swapper/2:0 [120]
<idle>-0 [002] 81.494884: (+7593) sched_switch: swapper/2:0 [120] R ==> chrt:1567 [4294967295]
userspin-1567 [002] 81.988095: (+493211) sched_switch: userspin:1567 [4294967295] R ==> rcuc/2:29 [98]
<idle>-0 [002] 82.496009: (+507914) sched_switch: swapper/2:0 [120] R ==> userspin:1567 [4294967295]
userspin-1567 [002] 82.988158: (+492149) sched_switch: userspin:1567 [4294967295] R ==> rcuc/2:29 [98]
<idle>-0 [002] 83.494897: (+506739) sched_switch: swapper/2:0 [120] R ==> userspin:1567 [4294967295]
userspin-1567 [002] 83.986181: (+491284) sched_switch: userspin:1567 [4294967295] R ==> rcuc/2:29 [98]
<idle>-0 [002] 84.494895: (+508714) sched_switch: swapper/2:0 [120] R ==> userspin:1567 [4294967295]

A bit of choppiness to get started. I'll have to look more into this.

I'll take a look at this patch deeper to try to understand everything
before applying a reviewed-by.

-- Steve

>
> Signed-off-by: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Juri Lelli <juri.lelli@xxxxxxx>
> Cc: Tommaso Cucinotta <tommaso.cucinotta@xxxxxxxx>
> Cc: Luca Abeni <luca.abeni@xxxxxxxxxxxxxxx>
> Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
>
> ---
> kernel/sched/deadline.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 70ef2b1..3c94d85 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -505,10 +505,15 @@ static void update_dl_entity(struct sched_dl_entity *dl_se,
> }
> }
>
> +static inline u64 dl_next_period(struct sched_dl_entity *dl_se)
> +{
> + return dl_se->deadline - dl_se->dl_deadline + dl_se->dl_period;
> +}
> +
> /*
> * If the entity depleted all its runtime, and if we want it to sleep
> * while waiting for some new execution time to become available, we
> - * set the bandwidth enforcement timer to the replenishment instant
> + * set the bandwidth replenishment timer to the replenishment instant
> * and try to activate it.
> *
> * Notice that it is important for the caller to know if the timer
> @@ -530,7 +535,7 @@ static int start_dl_timer(struct task_struct *p)
> * that it is actually coming from rq->clock and not from
> * hrtimer's time base reading.
> */
> - act = ns_to_ktime(dl_se->deadline);
> + act = ns_to_ktime(dl_next_period(dl_se));
> now = hrtimer_cb_get_time(timer);
> delta = ktime_to_ns(now) - rq_clock(rq);
> act = ktime_add_ns(act, delta);