Re: [PATCH] sched/deadline: Always calculate end of period on sched_yield()

From: Juri Lelli
Date: Mon Feb 15 2016 - 05:18:34 EST


Hi,

On 12/02/16 18:10, Steven Rostedt wrote:
> I'm writing a test case for SCHED_DEADLINE, and notice a strange
> anomaly. Every so often, a deadline is missed and when I looked into
> it, it happened because the sched_yield() had no effect (it didn't end
> the previous period and let the start of the next runtime happen on the
> end of the old period).
>
> deadline-2228 7...1 116.778420: sys_enter_sched_yield:
> deadline-2228 7d..3 116.778421: hrtimer_cancel: hrtimer=0xffff88011ebd79a0
> deadline-2228 7d..2 116.778422: rcu_utilization: Start context switch
> deadline-2228 7d..2 116.778423: rcu_utilization: End context switch
> deadline-2228 7d..4 116.778423: hrtimer_start: hrtimer=0xffff88011ebd79a0 function=hrtick/0x0 expires=116124420428 softexpires=116124420428
> deadline-2228 7...1 116.778425: sys_exit_sched_yield: 0x0
>
>
> Schedule was never called. A added some trace_printks() and discovered
> that this happens when sched_yield() is called right after a tick that
> updates its current bandwidth.
>
> When the schedule tick happens that updates the current bandwidth,
> update_curr_dl() is called, where it updates curr->se.exec_start to
> rq_clock_task(rq).
>
> The rq_clock_task(rq) gets updated by update_rq_clock_task() that gets
> update by various points in the scheduler.
>
> Now, if the user task calls sched_yield() just after a bandwidth update
> synced curr->se.exec_start to rq_clock_task(rq), when sched_yield()
> calls into update_curr_dl() we have:
>
> delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> if (unlikely((s64)delta_exec <= 0))
> return;
>
> Coming in here from a sched_yield() will have delta_exec == 0 if the
> sched_yield() was called after a DL tick and before another
> update_rq_clock_task() is called.
>
> This means that the task will not release its remaining runtime, and
> the will start off in the current period when it expected to be in the
> next period.
>
> The fix that appears to work for me is to add a test in
> update_curr_dl() to not exit if delta_exec is zero and
> dl_se->dl_yielded is true.
>
> Signed-off-by: Steven Rostedt <rostedt@xxxxxxxxxxx>
> ---
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index cd64c979d0e1..1dd180cda574 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -735,7 +735,7 @@ static void update_curr_dl(struct rq *rq)
> * approach need further study.
> */
> delta_exec = rq_clock_task(rq) - curr->se.exec_start;
> - if (unlikely((s64)delta_exec <= 0))
> + if (unlikely((s64)delta_exec <= 0 && !dl_se->dl_yielded))
> return;
>

This looks good to me. Do you think we could also skip some of the
following updates/accounting in this case? Not sure we win anything by
doing that, though.

Thanks,

- Juri