Re: [PATCH RFC] sched/fair: fix sudden expiration of cfq quota in put_prev_task()
Date: Mon Apr 06 2015 - 18:45:42 EST
Konstantin Khlebnikov <khlebnikov@xxxxxxxxxxxxxx> writes:
> Pick_next_task_fair() must be sure that here is at least one runnable
> task before calling put_prev_task(), but put_prev_task() can expire
> last remains of cfs quota and throttle all currently runnable tasks.
> As a result pick_next_task_fair() cannot find next task and crashes.
> This patch leaves 1 in ->runtime_remaining when current assignation
> expires and tries to refill it right after that. In the worst case
> task will be scheduled once and throttled at the end of slice.
I don't think expire_cfs_rq_runtime is the problem. What I believe
happens is this:
/prev/some_task is running, calls schedule() with nr_running == 2.
pick_next's first do/while loop does update_curr(/) and picks /next, and
the next iteration just sees check_cfs_rq_runtime(/next), and thus does
goto simple. However, there is now only /prev/some_task runnable, and it
hasn't checked the entire prev hierarchy for throttling, thus leading to
This would require that check_cfs_rq_runtime(/next) return true despite
being on_rq though, which iirc is not supposed to happen (note that we
do not call update_curr(/next), and it would do nothing if we did,
because /next isn't part of the current thread's hierarchy). However,
this /can/ happen if runtime has just been (re)enabled on /next, because
tg_set_cfs_bandwidth sets runtime_remaining to 0, not 1.
The idea was that each rq would grab runtime when they were scheduled
(pick_next_task_fair didn't ever look at throttling info), so this was
fine with the old code, but is a problem now. I think it would be
sufficient to just initialize to 1 in tg_set_cfs_bandwidth. The arguably
more precise option would be to only check_cfs_rq_runtime if
cfs_rq->curr is set, but the code is slightly less pretty.
Could you check this patch to see if it works (or the trivial
tg_set_bandwidth runtime_remaining = 1 patch)?