Re: SD_SHARE_CPUPOWER breaks scheduler fairness

From: Steve Rotolo
Date: Wed Jun 01 2005 - 14:06:50 EST


On Wed, 2005-06-01 at 10:47, Con Kolivas wrote:

> I didn't miss the point, but I guess I should have made that clear too.
>
> The number of tasks seen running on that sibling is still the same even if the
> queue is forced to be idle (witness by top thinking the load is 1 on that
> sibling even if it also shows quite a lot of idle time). It should therefore
> not attract any more tasks to itself.
> The task that is there will be trapped based on the fact that there is only
> one task _only_ if the other sibling is indefinitely running real time tasks,
> and _if_ there are other physical cpus we can use we should try to schedule
> the trapped task away. If we have N physical cpus (and N*2 logical), and we
> are running N real time threads I don't think we should expect to run
> SCHED_NORMAL tasks as well. If we have <N real time tasks (where N > 1) then
> we should still be able to run SCHED_NORMAL tasks, I agree. I'm a little
> reluctant to tackle this at this stage with the number of SMP balancing
> things already queued for -mm, but making a sibling appear more heavily laden
> when "pegged" (nr_running + 1) should suffice.
>

Consider what happens if:
- you have 2 physical cpus, 4 logical cpus
- you have 40 running SCHED_NORMAL tasks on a well balanced system --
roughly 10 on each runqueue
- start up a spinning SCHED_FIFO task on cpu 0

Assuming that cpu 1 is the sibling of 0, cpu 1 now has 10 SCHED_NORMAL
tasks that are totally screwed -- they will never, ever, run anywhere,
period.

Now consider what happens if I start up 40 more SCHED_NORMAL tasks. The
load-balancer will kindly place 10 of them on cpu 1's runqueue so they
too can be screwed for all eternity. Nice.

One more thing: I *think* wake_idle() tends to wake tasks to idle cpus
regardless of the idle cpu's runqueue length. This is why I say the
idle cpu becomes a magnet for even more tasks, until the balancer
straightens things out again.

I guess the bottom-line is: given N logical cpus, 1/N of all
SCHED_NORMAL tasks may get stuck on a sibling cpu with no chance to
run. All it takes is one spinning SCHED_FIFO task. Sounds like a bug.

--
Steve

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/