[patch] Real-Time Preemption, -RT-2.6.10-rc2-mm2-V0.7.32-0

From: Ingo Molnar
Date: Fri Dec 03 2004 - 16:01:29 EST



i have released the -V0.7.32-0 Real-Time Preemption patch, which can be
downloaded from the usual place:

http://redhat.com/~mingo/realtime-preempt/

this is a fixes-mostly release with one new feature:

implemented global RT-task balancing on SMP systems, which improves the
latency of RT tasks on SMP systems. The basic problem was that the 2.6
kernel has per-CPU runqueues. In the current design there is no
guarantee that if an RT task starves another, lower-prio (or same-prio)
RT task in a given local runqueue, that the starved task will ever be
migrated to another CPU: it has to wait for the higher-prio task to
finish. In short, task migration on SMP is fundamentally non-RT and
priority-insensitive. In particular the workloads and latencies reported
by Mark H. Johnson reflect such SMP scheduling artifacts.

the new global RT-task balancing feature solves this problem by tracking
the 'RT overload' situation (when there is more than one RT tasks on a
CPU) and makes other CPUs 'pull' RT tasks (and only RT tasks)
immediately when such a situation occurs.

To give an example, in the following scheduling scenario:

CPU#0 CPU#1

task-A SCHED_FIFO prio 30 task-C SCHED_FIFO prio 30 [curr]
task-B SCHED_FIFO prio 40 [curr]

task-B is the currently executing task on CPU#0, task-C is the currently
executing task on CPU#1. Now on the vanilla 2.6 kernel, if task-C
blocks, there's no guarantee that task-A will be run there - if there's
a SCHED_NORMAL task on CPU#1's runqueue then it will run indefinitely.
With global RT-balancing task-A will be scheduled on CPU#1 immediately
after task-C leaves it.

furthermore, if in the same scenario, if e.g. a RT-prio 35 task-D is
woken up on CPU#0, the vanilla 2.6 scheduler will not move it to CPU#1,
even though it could preempt the prio 30 task-C there. With global
RT-balancing this will happen and task-C will be preempted immediately
and task-D runs.

on low RT load (the common case) the scheduler behaves like the stock
scheduler - the new logic only kicks in if a CPU runqueue has 2 or more
RT tasks running at once.

anyway, while the feature is stable on my SMP testboxes, this is still a
nontrivial ~200-lines delta in the scheduler so there might be problems.
UP should not be affected by this.

other changes since -V0.7.32-20:

- local-APIC shutdown fix: this should solve some of the 'reboot hangs'
reports.

- more tracing fixes - might fix the 'truncated traces' problems.

- reduce the NMI watchdog frequency from 10 KHz to 1000 Hz.

- dont report futex reschedules as atomicity violations

to create a -V0.7.32-0 tree from scratch, the patching order is:

http://kernel.org/pub/linux/kernel/v2.6/linux-2.6.9.tar.bz2
http://kernel.org/pub/linux/kernel/v2.6/testing/patch-2.6.10-rc2.bz2
http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.10-rc2/2.6.10-rc2-mm2/2.6.10-rc2-mm2.bz2
http://redhat.com/~mingo/realtime-preempt/realtime-preempt-2.6.10-rc2-mm2-V0.7.32-0

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/