Re: [PATCH] pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode

From: Eric W. Biederman
Date: Fri Sep 25 2015 - 01:42:55 EST


bsegall@xxxxxxxxxx writes:

> ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:
>
>> Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> writes:
>>
>>> On Wed, 16 Sep 2015 12:58:04 -0700 bsegall@xxxxxxxxxx wrote:
>>>
>>>> setpriority(PRIO_USER, 0, x) will change the priority of tasks outside
>>>> of the current pid namespace. This is in contrast to both the other
>>>> modes of setpriority and the example of kill(-1). Fix this. getpriority
>>>> and ioprio have the same failure mode, fix them too.
>>>
>>> (cc Eric)
>> (cc Containers)
>>
>> Interesting. Strictly speaking the current behavior is not wrong.
>> Searching for all threads with a given uid has nothing to do with pids
>> so the pid namespace not limiting them is natural.
>>
>> In practice I don't think anyone cares either way (except people with
>> one color or another of security hat on) so this might be a change we
>> can actually make.
>>
>> In general it is probably better not to share uids and gids between
>> containers.
>>
>> Ben do you have a use case where this actually matters? Or was this a
>> case of "That looks wrong..."?
>>
>> Eric
>
> I believe we generally want this for isolation of a process, without
> requiring root initially (and a non-trivial uid_map, not to mention
> creating the extra users, requires root). There are probably other holes
> in using namespaces like this, but are they intended?

After some more thinking about it this patch sounds justifiable.

My goal with namespaces is not to build perfect isolation mechanisms
as that can get into ill defined territory, but to build well defined
mechanisms. And to handle the corner cases so you can use only
a single namespace with well defined results.

In this case you have found the two interfaces I am aware of that
identify processes by uid instead of by pid. Which quite frankly is
weird. Unfortunately the weird unexpected cases are hard to handle
in the usual way.

I was hoping for a little more information. Changes like this one we
have to be careful of because someone might be depending on the current
behavior. I don't think they are and I do think this make sense as part
of the pid namespace.

Acked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>

> (Cc the relevant team member at google)
>
>>
>>>> Signed-off-by: Ben Segall <bsegall@xxxxxxxxxx>
>>>> Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
>>>> Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>>>> ---
>>>> block/ioprio.c | 6 ++++--
>>>> kernel/sys.c | 4 ++--
>>>> 2 files changed, 6 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/block/ioprio.c b/block/ioprio.c
>>>> index 31666c9..cc7800e 100644
>>>> --- a/block/ioprio.c
>>>> +++ b/block/ioprio.c
>>>> @@ -123,7 +123,8 @@ SYSCALL_DEFINE3(ioprio_set, int, which, int, who, int, ioprio)
>>>> break;
>>>>
>>>> do_each_thread(g, p) {
>>>> - if (!uid_eq(task_uid(p), uid))
>>>> + if (!uid_eq(task_uid(p), uid) ||
>>>> + !task_pid_vnr(p))
>>>> continue;
>>>> ret = set_task_ioprio(p, ioprio);
>>>> if (ret)
>>>> @@ -220,7 +221,8 @@ SYSCALL_DEFINE2(ioprio_get, int, which, int, who)
>>>> break;
>>>>
>>>> do_each_thread(g, p) {
>>>> - if (!uid_eq(task_uid(p), user->uid))
>>>> + if (!uid_eq(task_uid(p), user->uid) ||
>>>> + !task_pid_vnr(p))
>>>> continue;
>>>> tmpio = get_task_ioprio(p);
>>>> if (tmpio < 0)
>>>> diff --git a/kernel/sys.c b/kernel/sys.c
>>>> index fa2f2f6..6af9212 100644
>>>> --- a/kernel/sys.c
>>>> +++ b/kernel/sys.c
>>>> @@ -222,7 +222,7 @@ SYSCALL_DEFINE3(setpriority, int, which, int, who, int, niceval)
>>>> goto out_unlock; /* No processes for this user */
>>>> }
>>>> do_each_thread(g, p) {
>>>> - if (uid_eq(task_uid(p), uid))
>>>> + if (uid_eq(task_uid(p), uid) && task_pid_vnr(p))
>>>> error = set_one_prio(p, niceval, error);
>>>> } while_each_thread(g, p);
>>>> if (!uid_eq(uid, cred->uid))
>>>> @@ -290,7 +290,7 @@ SYSCALL_DEFINE2(getpriority, int, which, int, who)
>>>> goto out_unlock; /* No processes for this user */
>>>> }
>>>> do_each_thread(g, p) {
>>>> - if (uid_eq(task_uid(p), uid)) {
>>>> + if (uid_eq(task_uid(p), uid) && task_pid_vnr(p)) {
>>>> niceval = nice_to_rlimit(task_nice(p));
>>>> if (niceval > retval)
>>>> retval = niceval;
>>>> --
>>>> 2.6.0.rc0.131.gf624c3d
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/