Re: [PATCH -mm 0/7] execns syscall and user namespace

From: Cedric Le Goater
Date: Wed Jul 12 2006 - 09:06:00 EST


Hello !

Hopefully, we will soon see each other at OLS. We need some synchronous
interaction !

Eric W. Biederman wrote:

>>> Is it not possible to ensure what you are trying to ensure with
>>> a good user space executable?
>> unshare() is unsafe for some namespaces because namespaces can reference
>> each other. For the ipc namespace, example are shm ids vs. vma, sem ids vs.
>> semundos, msq vs. netlink sockets. for the user namespace, open files. So
>> it seems reasonable to provide a way to unshare namespaces from a clean
>> process context.
>
> It is perfectly legitimate to have a shared memory region memory mapped
> from another namespace.

then after unshare, a process can be in ipc namespace B with a shared
memory segment from ipc namespace A without any id for this segment. this
is not very consistent. the same process will also be able to modify the
ipc namespace B without being in this namespace. ugly. It looks like an
issue that should be solved.

I think namespace should enforce strict isolation. nop ?

> Yes sem ids versus semunds is an issue but it just requires you to unshare
> one at the same time you unshare the other, or to simply clone a new
> namespace.

hmm, semids the from ipc namespace are stored in task->sysv_sem. i would
forbid the unshare/clone in that case or flush the semundos like in
exit_sem(). but it's easier not to have any, like in a clean process image.

> I'm not familiar with the msq vs netlink socket issue.

mq_notify can use a netlink socket to send an event back user space.

> As for the user namespace vs open files. If we have any issues with open
> files in any namespace that sounds like an implementation bug to me.

user_struct does accounting on process, open files, locked memory, signals,
etc. if you unshare such an object, you will need to unshare all others
namespaces to be consistent. again having a clean process image is easier ...

> I'm not convinced the problems you are seeing are not implementation bugs.
> For some things clone is still more general then unshare, and clone should
> be considered the primary user interface, not unshare.

agree on that, i might be focusing a bit too much on the unshare syscall.
we should work on clone to make sure it has the required restrictions. The
system is really interlinked and not all namespaces can be unshared standalone.

>> Now, if you try to do that from user space, you will call unshare() then
>> execve(), which leaves plenty of room and time for nasty things to happen
>> in between the 2 calls.
>
> I will look more closely but I think there is an important point being missed
> somewhere. Pieces of the kernel interact in all sorts of weird and unexpected
> ways. If we rely on ourselves always being in the right magic namespace for
> things to work correctly we are setting ourselves up for trouble. If we know
> a namespace implementation will work even when a process has access to entities
> in multiple instances of that namespace we are in much better shape.

having a clean process image is IMHO required for some namespaces :

http://marc.theaimsgroup.com/?l=linux-kernel&m=113881171017330&w=2

thanks,

C.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/