Re: [RFC PATCH 4/4] sched: Upload nohz full CPU load on task enqueue/dequeue

From: Frederic Weisbecker
Date: Wed Jan 20 2016 - 11:40:27 EST


On Wed, Jan 20, 2016 at 03:43:35PM +0100, Thomas Gleixner wrote:
> On Wed, 20 Jan 2016, Frederic Weisbecker wrote:
> > On Wed, Jan 20, 2016 at 10:03:32AM +0100, Thomas Gleixner wrote:
> > > I tell you since years, that you need to fix that remote accounting stuff,
> > > but no, you insist on adding more trainwrecks left and right.
> >
> > The solution you proposed to me was to do remote scheduler_tick() from
> > CPU 0 and this was nacked by peterz (and he was right).
>
> He did not nack the general approach of remote accounting, right?

He nacked the remote scheduler_tick(): https://lkml.org/lkml/2015/8/13/144
but not the general approach of remote accounting, which is the only way to
do what we want anyway, we just need to do it in a finer grained way.

>
> > We all know that we need to fix this remote accounting stuff, but I'm the
> > only one who actually _tries_, at least through RFC's to start discussions,
> > such that I find the right direction to move forward.
>
> Well, I do not see any attempt to do remote accounting, not even in a
> minimalistic form. The current RFC is about dealing with issues which are
> caused by the lack of continous (remote) accounting.

Well remote accounting of cpu load is something we really need, that's what I
planed to do, but I couldn't find a way to do it correctly without listening on
enqueue/dequeue event first. I said in the cover letter that it was still incomplete
and we need to find a way for target_load() to return up-to-date values (suggesting
we are going to need remote accounting and we'll need to debate around that).

>
> > > > The problem with doing this remotely is that we can miss past cpu loads if
> > > > there was several enqueue/dequeue operations happening while tickless.
> > >
> > > That's complete bullshit.
> > >
> > > 1) How is remote accounting that happens every tick different from local
> > > accounting which happens every tick?
> >
> > Enqueue/dequeue don't happen on tick, unless there is a wakeup on that interrupt.
>
> And how does that matter? Tick based accounting whether remote or local does
> not account for intermediate states at all.
>
> > > 2) How do you have enqueue/dequeue operations when you are running in full
> > > nohz, i.e. one task is consuming 100% cpu time in user space?
> >
> > Well that task is going to sleep, wake up, sleep like any other task. We
>
> If that tasks goes to sleep, then it leaves the full nohz state.

No, it stays in dynticks mode if we go idle. This isn't "full nohz" anymore but we
don't make much difference here. We could update cpu load on this transition
though.

>
> > need to account these slices properly. If a second task wakes up and restart
> > the tick, we must make sure that the previous tickless frame got accounted
> > properly.
>
> The previous tickless frame ends when that task goes to sleep. And that's
> where you update the accounting.

There is a continuity between full nohz and idle nohz, but surely we need to
update the cpu load in this transition. This was implied by the update on
enqueue/dequeue but if we don't take that direction yet, we'll need to do it
explicitly there.

>
> > Besides, if a SCHED_FIFO task runs (tickless) with SCHED_NORMAL tasks in the
> > runqueue, those are typically still accounted with the tick, so perhaps we
> > need to keep that behaviour without the tick as well and account those
> > SCHED_NORMAL task's load.
>
> So we agreed long time ago, that we first fix the issues with s single task
> running undisturbed in user space, i.e. tickless. Those issues have never been
> resolved fully, but now you try to add more complexity of extra runnable
> tasks, nohz tasks sleeping and whatever.

Nohz tasks do sleep, really, at least we need to handle that case now.

>
> Can we please go back to the point where this all started:
>
> ONE task running with 100% CPU in user space
>
> And get all the issues around that resolved proper, which involves remote
> accounting.

That was the plan to discuss in this rfc series.

>
> Once that works, you can add the new features, i.e. extra runnable tasks and
> whatever.

Sure I can ignore the more complicated scenario for now. I agree with that.

So what we can do is to record the load of the nohz task on full dynticks frame
entry. Then on that full dynticks frame exit, we account that recorded load.

Then I'll follow up with another series to do the remote accounting part.

How does that sound?

Thanks

> Thanks,
>
> tglx