Re: Controlling devices and device namespaces

From: Eric W. Biederman
Date: Sun Sep 16 2012 - 10:24:01 EST


Serge Hallyn <serge@xxxxxxxxxx> writes:

> On 09/16/2012 07:17 AM, Eric W. Biederman wrote:
>> ebiederm@xxxxxxxxxxxx (Eric W. Biederman) writes:
>>
>>> Alan Cox <alan@xxxxxxxxxxxxxxxxxxx> writes:
>>>
>>>>> One piece of the puzzle is that we should be able to allow unprivileged
>>>>> device node creation and access for any device on any filesystem
>>>>> for which it unprivileged access is safe.
>>>>
>>>> Which devices are "safe" is policy for all interesting and useful cases,
>>>> as are file permissions, security tags, chroot considerations and the
>>>> like.
>>>>
>>>> It's a complete non starter.
>>
>> Come to think of it mknod is completely unnecessary.
>>
>> Without mknod. Without being able to mount filesystems containing
>> device nodes.
>
> Hm? That sounds like it will really upset init/udev/upgrades in the
> container.

udev does not create device nodes. For an older udev the worst
I can see it doing is having mknod failing with EEXIST because
the device node already exists.

We should be able to make it look to init like a ramdisk mounted the
filesystems.

Why should upgrades care? Package installation shouldn't be calling
mknod.

At least with a recent modern distro I can't imagine this to be an
issue. I expect we could have a kernel build option that removed the
mknod system call and a modern distro wouldn't notice.

> Are you saying all filesystems containing device nodes will need to be
> mounted in advance by the process setting up the container?

As a general rule.

I think in practice there is wiggle room for special cases
like mounting a fresh devpts. devpts at least in always create a new
instance on mount mode seems safe, as it can not give you access to
any existing devices.

You can also do a lot of what would normally be done with mknod
with bind mounts to the original devices location.

>> The mount namespace is sufficient to prevent all of the
>> cases that the device control group prevents (open and mknod on device
>> nodes).
>>
>> So I honestly think the device control group is superflous, and it is
>> probably wise to deprecate it and move to a model where it does not
>> exist.
>>
>> Eric
>>
>
> That's what I said a few emails ago :) The device cgroup was meant as
> a short-term workaround for lack of user (and device) namespaces.

I am saying something stronger. The device cgroup doesn't seem to have
a practical function now. That for the general case we don't need any
kernel support. That all of this should be a matter of some user space
glue code, and just the tiniest bit of sorting out how hotplug events are
sent.

The only thing I can think we would need a device namespace for is
for migration.

For migration with direct access to real hardware devices we must treat
it as hardware hotunplug. There is nothing else we can do.

If there is any other case where we need to preserve device numbers
etc we have the example of devpts.

So at this point I really don't think we need a device namespace or a
device control group. (Just emulate devtmpfs, sysfs and uevents).

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/