Re: [RESEND PATCH] sched: consider missed ticks when updating global cpu load

From: Byungchul Park
Date: Thu Oct 01 2015 - 21:04:58 EST


On Wed, Sep 30, 2015 at 12:43:43PM +0200, Peter Zijlstra wrote:
> On Sat, Sep 26, 2015 at 03:14:45PM +0200, Frederic Weisbecker wrote:
>
> > > when the next tick occurs, update_process_times() -> scheduler_tick()
> > > -> update_cpu_load_active() is performed, assuming the distance between
> > > last tick and current tick is 1 tick! it's wrong in this case. thus,
> > > this abnormal case should be considered in update_cpu_load_active().
> > >
> > > Signed-off-by: Byungchul Park <byungchul.park@xxxxxxx>
> > > ---
> > > kernel/sched/fair.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 4d5f97b..829282f 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -4356,12 +4356,15 @@ void update_cpu_load_nohz(void)
> > > */
> > > void update_cpu_load_active(struct rq *this_rq)
> > > {
> > > + unsigned long curr_jiffies = READ_ONCE(jiffies);
> > > + unsigned long pending_updates;
> > > unsigned long load = weighted_cpuload(cpu_of(this_rq));
> > > /*
> > > * See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
> > > */
> > > - this_rq->last_load_update_tick = jiffies;
> > > - __update_cpu_load(this_rq, load, 1);
> > > + pending_updates = curr_jiffies - this_rq->last_load_update_tick;
> > > + this_rq->last_load_update_tick = curr_jiffies;
> > > + __update_cpu_load(this_rq, load, pending_updates);
> > > }
> >
> > That's right but __update_cpu_load() doesn't handle correctly pending updates
> > with non-zero loads. Currently, pending updates are wheeled through decay_load_missed()
> > that assume it's all about idle load.
> >
> > But in the cases you've enumerated, as well as in the nohz full case, missed pending
> > updates can be about buzy loads.
> >
> > I think we need to fix update_cpu_load() to handle that first, or your fix is
> > going to make things worse.
>
> Its worse than that, the whole call chain of update_process_times()
> fully assumes a single tick, fixing just the one function deep down to

yes update_process_times() currently assumes a single tick which is not
good. i think this assuming should be removed one by one so that full
NOHZ works completely.

> handle more than 1 tick is ass backwards.

why do you think fixing this funciton to handle more than 1 tick is
assbackwards? i don't think so because there's no dependancy between
this function and any other tick handling functions. that kind of
bugs should be fixed carefully one by one until all is done, though it
could be assbackwards if fixing something effects another tick handling.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/