On Fri, 2005-01-14 at 13:08 +1100, Con Kolivas wrote:
utz lehmann wrote:Just an idea. What about throttling runaway RT tasks?
If the system spend more than 98% in RT tasks for 5s consider this as a
_fatal error_. Print an error message and throttle RT tasks by inserting
ticks where only SCHED_OTHER tasks allowed. For a limit of 98% this
means one SCHED_OTHER only tick all 50 ticks.
The limit and timeout should be configurable and of course it can be
disabled.
I know this is against RT task preempt all SCHED_OTHER but this is only
for a fatal system state to be able to recover sanely. A locked up
machine is is the worse alternative.
There is a patch in -mm currently designed to use a sysrq key combination which converts all real time tasks to sched normal to save you if you desire in a lockup situation. We do want to preserve RT scheduling behaviour at all times without caveats for privileged users.
The sysrq is already in 2.6.10. I had to use it the last days a few
times. But it does help if you have no access to the console.
The RT throttling idea is not to change the behavior in normal
conditions. It's only for a fatal system state. If you have a runaway RT
task you can't guarantee the system is work properly anyway. It's
blocking vital kernel threads, filesystems, swap, keyboard, ...
It's a bit like out of memory. You can do nothing and panic. Or trying
something bad (killing processes) which is hopefully better as the
former.
btw: Are RT tasks excluded by the oom killer?
Attachment:
signature.asc
Description: OpenPGP digital signature