Re: [PATCH -mm 0/7] execns syscall and user namespace

From: Eric W. Biederman
Date: Tue Jul 11 2006 - 23:24:37 EST


Cedric Le Goater <clg@xxxxxxxxxx> writes:

> Hello eric,
>
> Eric W. Biederman wrote:
>
>>> The following patchset adds the user namespace and a new syscall execns.
>>>
>>> The user namespace will allow a process to unshare its user_struct table,
>>> resetting at the same time its own user_struct and all the associated
>>> accounting.
>>>
>>> The purpose of execns is to make sure that a process unsharing a namespace
>>> is free from any reference in the previous namespace. the execve() semantic
>>> seems to be the best candidate as it already flushes the previous process
>>> context.
>>>
>>> Thanks for reviewing, sharing, flaming !
>>
>>
>> I haven't had a chance to do a thorough review yet but why is
>> this needed?
>>
>> What can be left shared by switching to a new namespace and then
>> execing an executable?
>>
>> Is it not possible to ensure what you are trying to ensure with
>> a good user space executable?
>
> unshare() is unsafe for some namespaces because namespaces can reference
> each other. For the ipc namespace, example are shm ids vs. vma, sem ids vs.
> semundos, msq vs. netlink sockets. for the user namespace, open files. So
> it seems reasonable to provide a way to unshare namespaces from a clean
> process context.

It is perfectly legitimate to have a shared memory region memory mapped
from another namespace. Yes sem ids versus semunds is an issue but it
just requires you to unshare one at the same time you unshare the other,
or to simply clone a new namespace. I'm not familiar with the msq vs netlink
socket issue. As for the user namespace vs open files. If we have
any issues with open files in any namespace that sounds like an implementation
bug to me.

I'm not convinced the problems you are seeing are not implementation bugs.
For some things clone is still more general then unshare, and clone should
be considered the primary user interface, not unshare.

> Now, if you try to do that from user space, you will call unshare() then
> execve(), which leaves plenty of room and time for nasty things to happen
> in between the 2 calls.

I will look more closely but I think there is an important point being missed
somewhere. Pieces of the kernel interact in all sorts of weird and unexpected
ways. If we rely on ourselves always being in the right magic namespace for
things to work correctly we are setting ourselves up for trouble. If we know
a namespace implementation will work even when a process has access to entities
in multiple instances of that namespace we are in much better shape.

Eric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/