[RFC][PATCH 00/32] Nohz cpusets v2 (adaptive tickless kernel)

From: Frederic Weisbecker
Date: Wed Mar 21 2012 - 09:58:50 EST


Hi all,

A summary of what this is about can be found here:
https://lkml.org/lkml/2011/8/15/245

There are still a lot of things to handle. Especially about
what is done by scheduler_tick() but we also need to:

- completely handle cputime accounting (need to find every "reader"
of cputime and flush cputimes for all of them).
-handle perf
- handle irqtime finegrained accounting
- handle ilb load balancing
- etc...

Nonetheless this is time to post a new iteration of the patchset
because the design has changed a bit, some bugs have been fixed,
more simplification, more unification with dynticks-idle code,
namespace fixes, various improvements here and there...

The git branch can be fetched from:

git://github.com/fweisbec/linux-dynticks.git
nohz/cpuset-v2

Changelog since v1:

- Rebase against 3.3-rc7 + tip:timers/core branch targeted
for 3.4-rc1

- Refine some changelogs

- Adapt against latest rcu changes: introduce new APIs
rcu_user_enter(), rcu_user_exit(), rcu_user_enter_irq()
and rcu_user_exit_irq()

- Handle RCU idle mode with do_notify_resume() path

- Fix deadlock after double rq lock on schedule:
schedule() -> rq_lock -> next is idle task ->
tick_nohz_restart_sched_tick() -> wake up softirq ->
rq lock

- Fix lockup while issuing flush times IPI on exit path:

CPU 0 CPU 1

read_lock(tasklist_lock)
write_lock_irq(tasklist_lock)
smp_call_function(CPU 1)
* deadlock *

- Many namespace renames (cpuset_* to tick_nohz_*) and code migration
from sched.c to tick-sched.c

- Seperate code that determine if we can stop the idle tick and don't
use it for adaptive tickless mode.

- Fix adaptive tickless mode set on idle incidentally. TIF_NOHZ was
then missing on the following task that ran tickless, issuing some
illegal uses of RCU

- Restart the tick anytime more than one task is on the runqueue. We were previously
only covering wake ups, now we also handle migration and any other source of task enqueuing

- Handle use of RCU in schedule() when called right before resuming userspace
(new schedule_user() API)

- Take the decision to stop the tick from irq exit instead of the middle of the timer
interrupt. This gives more opportunity to stop it and is one step more to unify idle
and adaptive tickless.

- Unify tickless idle and tickless user/system CPU time accounting infrastructures.

- If the tick is stopped adaptively and we are going to schedule the idle
task, don't restart the tick.

- Remove task_nohz_mode per cpu var and use ts->tick_stopped instead. This
leads to more unification between idle tickless and adaptive tickless.



Frederic Weisbecker (32):
nohz: Separate idle sleeping time accounting from nohz logic
nohz: Make nohz API agnostic against idle ticks cputime accounting
nohz: Rename ts->idle_tick to ts->last_tick
nohz: Move nohz load balancer selection into idle logic
nohz: Move ts->idle_calls incrementation into strict idle logic
nohz: Move next idle expiry time record into idle logic area
cpuset: Set up interface for nohz flag
nohz: Try not to give the timekeeping duty to an adaptive tickless
cpu
x86: New cpuset nohz irq vector
nohz: Adaptive tick stop and restart on nohz cpuset
nohz/cpuset: Don't turn off the tick if rcu needs it
nohz/cpuset: Wake up adaptive nohz CPU when a timer gets enqueued
nohz/cpuset: Don't stop the tick if posix cpu timers are running
nohz/cpuset: Restart tick when nohz flag is cleared on cpuset
nohz/cpuset: Restart the tick if printk needs it
rcu: Restart the tick on non-responding adaptive nohz CPUs
rcu: Restart tick if we enqueue a callback in a nohz/cpuset CPU
nohz: Generalize tickless cpu time accounting
nohz/cpuset: Account user and system times in adaptive nohz mode
nohz/cpuset: New API to flush cputimes on nohz cpusets
nohz/cpuset: Flush cputime on threads in nohz cpusets when waiting
leader
nohz/cpuset: Flush cputimes on procfs stat file read
nohz/cpuset: Flush cputimes for getrusage() and times() syscalls
x86: Syscall hooks for nohz cpusets
x86: Exception hooks for nohz cpusets
x86: Add adaptive tickless hooks on do_notify_resume()
nohz: Don't restart the tick before scheduling to idle
rcu: New rcu_user_enter() and rcu_user_exit() APIs
rcu: New rcu_user_enter_irq() and rcu_user_exit_irq() APIs
rcu: Switch to extended quiescent state in userspace from nohz cpuset
nohz: Exit RCU idle mode when we schedule before resuming userspace
nohz/cpuset: Disable under some configs

arch/Kconfig | 3 +
arch/x86/Kconfig | 1 +
arch/x86/include/asm/entry_arch.h | 3 +
arch/x86/include/asm/hw_irq.h | 7 +
arch/x86/include/asm/irq_vectors.h | 2 +
arch/x86/include/asm/smp.h | 11 +
arch/x86/include/asm/thread_info.h | 10 +-
arch/x86/kernel/entry_64.S | 12 +-
arch/x86/kernel/irqinit.c | 4 +
arch/x86/kernel/ptrace.c | 10 +
arch/x86/kernel/signal.c | 3 +
arch/x86/kernel/smp.c | 26 ++
arch/x86/kernel/traps.c | 20 +-
arch/x86/mm/fault.c | 13 +-
fs/proc/array.c | 2 +
include/linux/cpuset.h | 29 ++
include/linux/kernel_stat.h | 2 +
include/linux/posix-timers.h | 1 +
include/linux/rcupdate.h | 8 +
include/linux/sched.h | 10 +-
include/linux/tick.h | 75 ++++--
init/Kconfig | 8 +
kernel/cpuset.c | 107 +++++++
kernel/exit.c | 8 +
kernel/posix-cpu-timers.c | 12 +
kernel/printk.c | 15 +-
kernel/rcutree.c | 150 ++++++++--
kernel/sched/core.c | 83 ++++++-
kernel/sched/sched.h | 23 ++
kernel/softirq.c | 6 +-
kernel/sys.c | 6 +
kernel/time/tick-sched.c | 540 +++++++++++++++++++++++++++++-------
kernel/time/timer_list.c | 7 +-
kernel/timer.c | 2 +-
34 files changed, 1042 insertions(+), 177 deletions(-)

--
1.7.5.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/