Re: [PATCH] nohz_full: Make sched_should_stop_tick() more conservative

From: Peter Zijlstra
Date: Thu Apr 21 2016 - 10:42:27 EST


On Mon, Apr 18, 2016 at 10:00:42AM +0800, Wanpeng Li wrote:
> > H is for hierarchy. That counts the total of runnable tasks in the
> > entire child hierarchy. Nr_running is the number of se entities in
> > the current tree.
>
> So I think we should at least change cfs_rq->nr_running to
> cfs->h_nr_running, I can send a formal patch if you think it makes
> sense. :-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1159423..79197df 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -616,7 +616,7 @@ bool sched_can_stop_tick(struct rq *rq)
> }
>
> /* Normal multitasking need periodic preemption checks */
> - if (rq->cfs.nr_running > 1)
> + if (rq->cfs.h_nr_running > 1)
> return false;
>
> return true;

So I think that is indeed the right thing here. But looking at this
function I think there's more problems with it.

It seems to assume that if there's FIFO tasks, those will run. This is
incorrect. The FIFO task can have a lower prio than an RR task, in which
case the RR task will run.

So the whole fifo_nr_running test seems misplaced, it should go after
the rr_nr_running tests. That is, only if !rr_nr_running, can we use
fifo_nr_running like this.