SD_SHARE_CPUPOWER breaks scheduler fairness

From: Steve Rotolo
Date: Tue May 31 2005 - 12:49:12 EST


The SD_SHARE_CPUPOWER flag in SMT scheduling domains (hyperthread
systems) can starve out sched_other tasks and even hang the system. A
long-running (or run-away) sched_fifo task causes sched_other tasks to
get stuck on the sibling cpu's runqueue without any chance to run. The
sibling cpu simply stays idle with tasks on it's runqueue for as long as
the sched_fifo task runs on the other sibling cpu. The culprit is
dependent_sleeper() in sched.c.

I guess the SD_SHARE_CPUPOWER is supposed to cause the scheduler to
prohibit non-real-time tasks from running on a cpu while a real-time
task is running on the sibling cpu. The problem is that sched_other
tasks are not migrated to a different runqueue and essentially get stuck
on a dead runqueue until either the sched_fifo task yields or the
load-balancer moves him. Unfortunately, the load-balancer will never
migrate the task if the runqueue length is not sufficiently out of
balance. Even more unfortunate, the load-balancer will actually move
tasks *to* the dead runqueue if it is less busy. And still worse, since
SD_WAKE_IDLE is also set in the scheduling domain, the dead cpu will
actually attract waking tasks to it because it is idle! The cpu becomes
a sort-of black-hole sucking in innocent tasks so they can no longer
run.

The worst-case scenario is when there are N spinning sched_fifo tasks on
an N-way hyperthreaded system. This hangs the system since nothing can
run on the virtual cpus. If you turn off the SD_SHARE_CPUPOWER flag,
the system stays fully functional until you have N*2 spinners hogging
all the virtual cpus.

I get the same behavior from 2.6.9 to 2.6.12-rc5. So is this a bug or a
feature?

--
Steve Rotolo
Concurrent Computer Corporation

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/