Re: [NOHZ] Remove scheduler_tick_max_deferment
From: Frederic Weisbecker
Date: Mon Nov 10 2014 - 15:27:18 EST
On Sat, Nov 01, 2014 at 04:52:13PM -0500, Christoph Lameter wrote:
> On Sat, 1 Nov 2014, Thomas Gleixner wrote:
>
> > On Fri, 31 Oct 2014, Christoph Lameter wrote:
> > > The reasoning behind this function is not clear to me and removal seems
> >
> > The comment above the function is clear enough.
>
> I looked around into the functions called by the timer interrupt for
> accounting etc. They have measures to compensate if the HZ is not
> occurring for some time.
Not very well. They handle correctly dynticks idle but not dynticks full.
Checkout update_cpu_load_active() -> __update_cpu_load() for example.
There is a pending_update argument that take care of tickless delta but
decay_load_miss() catch up with the missing cpu load assuming it was all 0 (idle)
all that time.
Generally speaking the scheduler assume dynticks to be idle dynticks. And that
concerns the above example and probably many other accounting.
Now the issue with update_cpu_load_active() is there, whether we keep 1 Hz or not,
any delta of full dynticks workload makes it buggy because it's accounted as idle
load.
But removing the 1 Hz residual tick is dangerous because many accounting in the
scheduler tick assume regular updates. It's mostly ok as long as the accounting
is exclusively updated and read locally. But some accounting is also updated locally
and read remotely. So if CPU 0 is full dynticks and runs for 1 hour in userspace and
CPU 1 reads its stats, those will be buggy because of the missing updates. At best
in this scenarion CPU 1 may consider that CPU 0 has been idle for 1 hour, at worst
the stats can be junk and there can be crashes. Also a lot of the scheduler decisions
is based on these accountings. Load balancing to the least.
So we have two possible solutions:
1) Make the scheduler more full-dynticks aware. Which means that any remote
stat accounting read must handle out of date results. That's going to be tricky: if
you check scheduler_tick() and sched_class::task_tick(), even simply trying to
sort out which stat is updated, can handle busy dynticks load, is read only locally
or can be read remotely, handles overflow, etc... That's enough work for an army of ants.
2) Offload scheduler_tick() to the housekeeping. It looks like many of the updaters
there can easily take a remote rq argument. There doesn't seem to be much local rq
assumption. So that's the easiest solution.
But we can't just remove scheduler_tick_max_deferment() and not fix things behind.
The result will be unpredictably insane and dangerous. The only predictable thing
that's going to happen if we do that is that nobody will ever fix it properly.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/