Re: [patch, 2.6.11-rc2] sched: RLIMIT_RT_CPU_RATIO feature

From: Peter Williams
Date: Tue Jan 25 2005 - 17:33:26 EST


Peter Williams wrote:
Ingo Molnar wrote:

pretty much the only criticism of the RT-CPU patch was that the global
sysctl is too rigid and that it doesnt allow privileged tasks to ignore
the limit. I've uploaded a new RT-CPU-limit patch that solves this
problem:

http://redhat.com/~mingo/rt-limit-patches/

i've removed the global sysctl and implemented a new rlimit,
RT_CPU_RATIO: the maximum amount of CPU time RT tasks may use, in
percent. For testing purposes it defaults to 80%.

the RT-limit being an rlimit makes it much more configurable: root tasks
can have unlimited CPU time limit, while users could have a more
conservative setting of say 30%. This also makes it per-process and
runtime configurable as well. The scheduler will instantly act upon any
new RT_CPU_RATIO rlimit.

(this approach is fundamentally different from the previous patch that
made the "maximum RT-priority available to an unprivileged task" value
an rlimit - with priorities being an rlimit we still havent made RT
priorities safe against deadlocks.)

multiple tasks can have different rlimits as well, and the scheduler
interprets it the following way: it maintains a per-CPU "RT CPU use"
load-average value and compares it against the per-task rlimit. If e.g. the task says "i'm in the 60% range" and the current average is 70%,
then the scheduler delays this RT task - if the next task has an 80%
rlimit then it will be allowed to run. This logic is straightforward and
can be used as a further control mechanism against runaway highprio RT
tasks.

other properties of the RT_CPU_RATIO rlimit:

- if there's idle time in the system then RT tasks will be
allowed to use more than the limit.

- if an RT task goes above the limit all the time then there
is no guarantee that exactly the limit will be allowed for
it. (i.e. you should set the limit to somewhat above the real
needs of the RT task in question.)

- a zero RLIMIT_RT_CPU_RATIO value means unlimited CPU time to that
RT task. If the task is not an RT task then it may not change to RT
priority. (i.e. a zero value makes it fully compatible with previous
RT scheduling semantics.)

- a nonzero rt_cpu_limit value also has the effect of allowing
the use of RT priorities to nonprivileged users.

- on SMP the limit is measured and enforced per-CPU.

- runtime overhead is minimal, especially if the limit is set to 0.

- the CPU-use measurement code has a 'memory' of roughly 300 msecs.
I.e. if an RT task runs 100 msecs nonstop then it will increase
its CPU use by about 30%. This should be fast enough for users for
the limit to be human-inperceptible, but slow enough to allow
occasional longer timeslices to RT tasks.


As I understand this (and I may be wrong), the intention is that if a task has its RT_CPU_RATIO rlimit set to a value greater than zero then setting its scheduling policy to SCHED_RR or SCHED_FIFO is allowed. This causes me to ask the following questions:

1. Why is current->signal->rlim[RLIMIT_RT_CPU_RATIO].rlim_cur being used in setscheduler() instead of p->signal->rlim[RLIMIT_RT_CPU_RATIO].rlim_cur?

2. What stops a task that had a non zero RT_CPU_RATIO rlimit and changed its policy to SCHED_RR or SCHED_FIFO from then setting RT_CPU_RATIO rlimit back to zero and escaping the controls? As far as I can see (and, once again, I may be wrong) the mechanism for setting rlimits only requires CAP_SYS_RESOURCE privileges in order to increase the value.

Oops, after rereading the patch, a task that set its RT_CPU_RATIO rlimit to zero wouldn't be escaping the mechanism at all. It would be suffering maximum throttling. Sorry for the silly question.

Peter
--
Peter Williams pwil3058@xxxxxxxxxxxxxx

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/