Re: [PATCH v2] sched: let __sched_period() use rq's nr_running

From: Byungchul Park
Date: Mon Jul 13 2015 - 07:10:53 EST


On Mon, Jul 13, 2015 at 12:22:17PM +0200, Mike Galbraith wrote:
> On Mon, 2015-07-13 at 17:29 +0900, Byungchul Park wrote:
> > On Mon, Jul 13, 2015 at 09:07:01AM +0200, Mike Galbraith wrote:
> > > On Mon, 2015-07-13 at 09:56 +0900, Byungchul Park wrote:
> > >
> > > > and i agree with that it makes latency increase for non-grouped tasks.
> > >
> > > It's not only a latency hit for the root group, it's across the board.
> > >
> > > I suspect an overloaded group foo/bar/baz would prefer small slices over
> > > a large wait as well. I certainly wouldn't want my root group taking the
> > > potentially huge hits that come with stretching period to accommodate an
> > > arbitrarily overloaded /foo/bar/baz.
> >
> > hello, mike :)
> >
> > ok, then, do you think that the period have to be stretched by the number of
> > rq's sched entity(e.i. rq->cfs.nr_running)? if it is done with rq->cfs.nr_running,
> > as you can guess, leaf sched entities(e.i. tasks) can have much smaller slice
> > than sysctl_sched_min_granularity. and some code using sysctl_sched_min_granularity
> > need to be fixed in addition.
>
> The only choice is to give a small slice frequently or a large slice
> infrequently. Increasing spread for the entire world to accommodate a
> massive overload of a small share group in some hierarchy is just not a
> viable option.
>
> > anyway, current code looks broken since it stretching with local cfs's nr_running.
> > IMHO, it should be stretched with rq->*cfs.nr_running* though leaf tasks can have
> > very small slice, or it should be stretched with rq->*nr_running* to ensure that
> > any task can have a slice which can be comparable to sysctl_sched_min_granularity.
> >
> > what do you think about this concern?
>
> It seems to work fine. Just say "oh hell no" to hierarchies, and if you
> think slices are too small, widen latency_ns a bit to get what you want
> to see on your box. Computing latency target bottom up and cumulatively
> is a very bad idea, that lets one nutty group dictate latency for all.

hello,

this is what i missed! i see why computing latency target bottom up is bad.
then... my first option, stretching with the number of rq cfs's sched entities,
e.i. rq->cfs.nr_running, should be choosen to compute latency target,
with additional fix of code assuming that task's execution time is comparable
to sysctl_sched_min_granularity which is not true now.

i still think stretching with local cfs's nr_running should be replaced with
stretching with a top(=root) level one.

thank you,
byungchul

>
> Something else to keep in mind when fiddling is that FAIR_SLEEPERS by
> definition widens spread, effectively doubling our latency target, as
> the thing it is defined by is that latency target. We need short term
> fairness so sleepers can perform when facing a world full of hogs, but
> the last thing we need is short term being redefined to a week or so ;-)
>
> -Mike
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/