Re: [PATCH] scheduling priorities with rlimit
From: utz lehmann
Date: Mon Jan 10 2005 - 13:09:58 EST
On Sun, 2005-01-09 at 12:34 -0800, Chris Wright wrote:
> * Arjan van de Ven (arjan@xxxxxxxxxxxxx) wrote:
> > I much rather have the rlimit match the exact nice values we communicate
> > to userspace elsewhere, both to be consistent and to not expose
> > scheduler internals to userpsace.
>
> The problem is the numbers are inconsistent between user interfaces already.
> RT priorities are [0, 99], nice vaules are [-20, 19]. Perhaps it'd be
> simpler to break it down to just three values for the rlimit.
>
> 0: Same as now, raise nice value only.
> 1: Can lower nice value.
> 2: Can set RT policy (this includes any priority [1, 99], or optionally
> max out at something lower than 99, reserving full CAP_SYS_NICE to 99).
>
> Each level inherits the permissions of the lower level, and none of them
> allow the CAP_SYS_NICE ability to affect processes other than your own.
I dont like this. I dont what to give user the ability to renice there
jobs to -20. I need numeric limits.
But i think it's mainly a problem of userspace to present userfriendly
values. There are already conversions of rlimit values in pam_limits and
ulimit.
What about this. Separate the rlimit in RLIMIT_NICE and LIMIT_RT.
Putting both into one value is not a good idea, confusing and error
prone. Setting (by fault) RLIMIT_NICE to unlimited is not so risky as
doing it for the old RLIMIT_PRIO.
RLIMIT_RT has the same values like RT priorities 0-99
For RLIMIT_NICE is not possible because the negative nice levels.
Using 0-39 for the nice levels 19 .. -20. It has the advantage that has
the same meaning like the other rlimits, greater value means more
resources.
With a patched PAM you can simply do this in /etc/security/limits.conf
@student hard nice 5
@stuff hard nice 0
@stuff soft nice 5
@admin hard nice -10
@admin soft nice -10
@admin hard realtime 10
@admin soft realtime 10
The nice values are converted by pam_limits to 0-39.
diff -Nrup linux-2.6.10/include/linux/sched.h linux-2.6.10-prio4/include/linux/sched.h
--- linux-2.6.10/include/linux/sched.h 2004-12-24 22:33:59.000000000 +0100
+++ linux-2.6.10-prio4/include/linux/sched.h 2005-01-10 17:28:51.699861886 +0100
@@ -738,6 +738,7 @@ extern void sched_idle_next(void);
extern void set_user_nice(task_t *p, long nice);
extern int task_prio(const task_t *p);
extern int task_nice(const task_t *p);
+extern unsigned long nice_to_rlimit_nice(const int nice);
extern int task_curr(const task_t *p);
extern int idle_cpu(int cpu);
diff -Nrup linux-2.6.10/kernel/sched.c linux-2.6.10-prio4/kernel/sched.c
--- linux-2.6.10/kernel/sched.c 2004-12-24 22:35:24.000000000 +0100
+++ linux-2.6.10-prio4/kernel/sched.c 2005-01-10 17:25:28.079188450 +0100
@@ -73,6 +73,12 @@
#define MAX_USER_PRIO (USER_PRIO(MAX_PRIO))
/*
+ * convert nice to RLIMIT_NICE values ([ 19 ... -20 ] to [ 0 ... 39 ])
+ */
+
+#define NICE_TO_RLIMIT_NICE(nice) (19 - nice)
+
+/*
* Some helpers for converting nanosecond timing to jiffy resolution
*/
#define NS_TO_JIFFIES(TIME) ((TIME) / (1000000000 / HZ))
@@ -3008,12 +3014,8 @@ asmlinkage long sys_nice(int increment)
* We don't have to worry. Conceptually one call occurs first
* and we have a single winner.
*/
- if (increment < 0) {
- if (!capable(CAP_SYS_NICE))
- return -EPERM;
- if (increment < -40)
- increment = -40;
- }
+ if (increment < -40)
+ increment = -40;
if (increment > 40)
increment = 40;
@@ -3023,6 +3025,12 @@ asmlinkage long sys_nice(int increment)
if (nice > 19)
nice = 19;
+ if (increment < 0 &&
+ NICE_TO_RLIMIT_NICE(nice) >
+ current->signal->rlim[RLIMIT_NICE].rlim_cur &&
+ !capable(CAP_SYS_NICE))
+ return -EPERM;
+
retval = security_task_setnice(current, nice);
if (retval)
return retval;
@@ -3056,6 +3064,15 @@ int task_nice(const task_t *p)
}
/**
+ * nice_to_rlimit_nice - return rlimit_nice priority of give nice value
+ * @nice: nice value
+ */
+unsigned long nice_to_rlimit_nice(const int nice)
+{
+ return NICE_TO_RLIMIT_NICE(nice);
+}
+
+/**
* idle_cpu - is a given cpu idle currently?
* @cpu: the processor in question.
*/
@@ -3139,6 +3156,7 @@ recheck:
retval = -EPERM;
if ((policy == SCHED_FIFO || policy == SCHED_RR) &&
+ lp.sched_priority > p->signal->rlim[RLIMIT_RT].rlim_cur &&
!capable(CAP_SYS_NICE))
goto out_unlock;
if ((current->euid != p->euid) && (current->euid != p->uid) &&
diff -Nrup linux-2.6.10/kernel/sys.c linux-2.6.10-prio4/kernel/sys.c
--- linux-2.6.10/kernel/sys.c 2004-12-24 22:33:59.000000000 +0100
+++ linux-2.6.10-prio4/kernel/sys.c 2005-01-10 17:29:50.378989385 +0100
@@ -224,7 +224,10 @@ static int set_one_prio(struct task_stru
error = -EPERM;
goto out;
}
- if (niceval < task_nice(p) && !capable(CAP_SYS_NICE)) {
+ if (niceval < task_nice(p) &&
+ nice_to_rlimit_nice(niceval) >
+ p->signal->rlim[RLIMIT_NICE].rlim_cur &&
+ !capable(CAP_SYS_NICE)) {
error = -EACCES;
goto out;
}
diff -Nrup linux-2.6.10/include/asm-i386/resource.h linux-2.6.10-prio4/include/asm-i386/resource.h
--- linux-2.6.10/include/asm-i386/resource.h 2004-12-24 22:35:50.000000000 +0100
+++ linux-2.6.10-prio4/include/asm-i386/resource.h 2005-01-10 16:55:43.480164770 +0100
@@ -18,8 +18,11 @@
#define RLIMIT_LOCKS 10 /* maximum file locks held */
#define RLIMIT_SIGPENDING 11 /* max number of pending signals */
#define RLIMIT_MSGQUEUE 12 /* maximum bytes in POSIX mqueues */
+#define RLIMIT_NICE 13 /* max nice prio allowed to raise to
+ 0-39 for nice level 19 .. -20 */
+#define RLIMIT_RT 14 /* maximum realtime priority */
-#define RLIM_NLIMITS 13
+#define RLIM_NLIMITS 15
/*
@@ -45,6 +48,8 @@
{ RLIM_INFINITY, RLIM_INFINITY }, \
{ MAX_SIGPENDING, MAX_SIGPENDING }, \
{ MQ_BYTES_MAX, MQ_BYTES_MAX }, \
+ { 0, 0 }, \
+ { 0, 0 }, \
}
#endif /* __KERNEL__ */
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/