Re: 1 RT task blocks 4-core machine ?

From: Peter Zijlstra
Date: Mon Oct 11 2010 - 03:54:00 EST


On Sat, 2010-10-09 at 19:42 +0200, Tommaso Cucinotta wrote:
> Peter wrote:
> > On Tue, 2010-10-05 at 00:26 +0200, Tommaso Cucinotta wrote:
> > > A possible explanation might be that the CFS load balancing logic sees
> > > my only active task (e.g., the ssh server or shell etc.) as running
> > > alone on its core, and does not detect that it is inhibited to actually
> > > run due to RT tasks on the same core. Therefore, it will not migrate
> > > the task to the free cores. Does this explanation make sense
> > > or is it completely wrong ?
> >
> > Possibly, its got some logic to detect this but maybe it gets confused
> > still, in particular look at the adaptive cpu_power in
> > update_cpu_power() and calling functions.
>
> Ok, I'll have a look (when I have some time :-( ), thanks.
>
> > > Also, I'd like to hear whether this is considered the "normal/desired"
> > > behavior of intermixing RT and non-RT tasks.
> >
> > Pegging a cpu using sched_fifo/rr pretty much means you get to keep the
> > pieces, if it works nice, if you can make it work better kudos, but no
> > polling from sched_fifo/rr is not something that is considered sane for
> > the general health of your system.
>
> Sure, I was not thinking to push/pull across heterogeneous scheduling
> classes, but rather to simply account for the proper per-CPU tasks count
> and load (including all the tasks comprising RT ones) when load-balancing
> in CFS.

Right, so we do that. Part of the problem is that RR/FIFO tasks have no
weight/load (not even a worst case weight like sporadic tasks have). So
what we do is (per-cpu) take an average measure of the time spend on !
CFS tasks (sched_rt_avg_update() and friends) and use that to lower that
CPUs total throughput, which is reflected in the mentioned ->cpu_power
variable.

> Perhaps, you mean, e.g., if a RT task ends, the CPU would go idle
> and it would be supposed to pull ? Just we don't do that, and at the next
> load-balancing decision things would be fixed up (please, consider I don't
> know the CFS load balancer so well).

No, what I meant was that if a particular CPU is very busy with !CFS
work, its ->cpu_power variable will decrease to 1 (0 will get us
division by zero issues). Somehow we need to avoid this load-balancer
from thinking its a good idea to place tasks there.

The natural balance is to move tasks away from weak CPUs, but clearly
its not good enough.

Also, there is housekeeping that needs to be done on a per-cpu basis.
CPU affine tasks like workqueue things need to run in order to keep the
system functional, pegging a CPU with a RT task starves these, causing
general system dysfunction.

> So, for example, in addition to fix the reported issue, we'd get also that,
> when pinning a heavy RT workload on a CPU, CFS tasks would migrate to other
> CPUs, if available. Again, that doesn't need to be instantaneous (push), but
> it could happen later when the CFS load-balancer is invoked (is it invoked
> periodically, as of now ?).

That should basically work, we normalize the cpu load (sum of all cfs
task weights) by the ->cpu_power, a weak cpu will tend to get all its
tasks migrated away to stronger CPUs, again, there's probably some
corner case that doesn't quite work as expected.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/