[PATCH RFC nohz_full 0/8] Provide infrastructure for full-system idle

From: Paul E. McKenney
Date: Tue Jun 25 2013 - 17:37:32 EST


Whenever there is at least one non-idle CPU, it is necessary to
periodically update timekeeping information. Before NO_HZ_FULL, this
updating was carried out by the scheduling-clock tick, which ran on
every non-idle CPU. With the advent of NO_HZ_FULL, it is possible
to have non-idle CPUs that are not receiving scheduling-clock ticks.
This possibility is handled by assigning a timekeeping CPU that continues
taking scheduling-clock ticks.

Unfortunately, timekeeping CPU continues taking scheduling-clock
interrupts even when all other CPUs are completely idle, which is
not so good for energy efficiency and battery lifetime. Clearly, it
would be good to turn off the timekeeping CPU's scheduling-clock tick
when all CPUs are completely idle. This is conceptually simple, but
we also need good performance and scalability on large systems, which
rules out implementations based on frequently updated global counts of
non-idle CPUs as well as implementations that frequently scan all CPUs.
Nevertheless, we need a single global indicator in order to keep the
overhead of checking acceptably low.

The chosen approach is to enforce hysteresis on the non-idle to
full-system-idle transition, with the amount of hysteresis increasing
linearly with the number of CPUs, thus keeping contention acceptably low.
This approach piggybacks on RCU's existing force-quiescent-state scanning
of idle CPUs, which has the advantage of avoiding the scan entirely on
busy systems that have high levels of multiprogramming. This scan
take per-CPU idleness information and feeds it into a state machine
that applies the level of hysteresis required to arrive at a single
full-system-idle indicator.

Note that this version pays attention to CPUs that have taken an NMI
from idle. It is not clear to me that NMI handlers can safely access
the time on a system that is long-term idle. Unless someone tells me
that it is somehow safe to access time from an NMI from idle, I will
remove NMI support in the next version.

Thanx, Paul

------------------------------------------------------------------------

b/include/linux/rcupdate.h | 18 +
b/kernel/rcutree.c | 56 ++++-
b/kernel/rcutree.h | 20 ++
b/kernel/rcutree_plugin.h | 427 ++++++++++++++++++++++++++++++++++++++++++++-
b/kernel/time/Kconfig | 23 ++
5 files changed, 527 insertions(+), 17 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/