Re: [COMMERCIAL] Re: [PATCH 0/3] kobject: support namespace aware udev

From: Greg KH
Date: Thu Sep 10 2015 - 01:22:09 EST


On Wed, Sep 09, 2015 at 04:55:22PM -0400, Michael J Coss wrote:
> On 9/9/2015 4:28 PM, Greg KH wrote:
> > On Wed, Sep 09, 2015 at 04:16:49PM -0400, Michael J Coss wrote:
> >> On 9/9/2015 4:09 PM, Greg KH wrote:
> >>> On Wed, Sep 09, 2015 at 03:05:29PM -0400, Michael J Coss wrote:
> >>>> On 9/8/2015 11:54 PM, Greg KH wrote:
> >>>>> On Tue, Sep 08, 2015 at 10:10:27PM -0400, Michael J. Coss wrote:
> >>>>>> Currently when a uevent occurs, the event is replicated and sent to every
> >>>>>> listener on the kernel netlink socket, ignoring network namespaces boundaries,
> >>>>>> forwarding events to every listener in every network namespace.
> >>>>>>
> >>>>>> With the expanded use of containers, it would be useful to be able to
> >>>>>> regulate this flow of events to specific containers. By restricting
> >>>>>> the events to only the host network namespace, it allows for a userspace
> >>>>>> program to provide a system wide policy on which events are routed where.
> >>>>> Interesting, but why do you need a container to get a uevent at all?
> >>>>> What uevents do a container care about?
> >>>>>
> >>>>> thanks,
> >>>>>
> >>>>> greg k-h
> >>>>>
> >>>> In our use case, we run a full desktop inside the container, including
> >>>> X.
> >>> Ugh, I was worried you were going to say that :(
> >>>
> >>>> We run the Xserver in headless mode, and forward a uevent to the
> >>>> container to allow binding/unbinding of remote keyboard, mice, and
> >>>> displays. So I want the add/del keyboard events, add/del mouse events,
> >>>> and add/del display events. This is just one use case, I could image
> >>>> others. The bottom line is that the current behavior is to broadcast to
> >>>> everyone all uevents, and I don't see that as correct as it crosses the
> >>>> network namespace boundaries. It seems to me that you would want to
> >>>> provide controls as to where you want to forward those uevents, and
> >>>> that is not a policy that I believe should be in the kernel but rather
> >>>> in user space.
> >>> devices are not in namespaces, which is why we don't partition them off
> >>> at all. And that's why I really don't want to add this type of
> >>> filtering either. It's up to the "master" container/process/whatever to
> >>> send uevents to child containers if it really wants to. If we were to
> >>> ever have devices bound only to namespaces, then it would make sense to
> >>> only send the uevents for those devices to that namespace.
> >>>
> >>> But as that's never going to happen, I don't want to give people a false
> >>> sense of "separation" here that isn't really there at all.
> >>>
> >>> sorry,
> >>>
> >>> greg k-h
> >>>
> >> Agreed that devices are not in namespaces, but the events are, or at
> >> least could be.
> > No, there's no way to tell which event for which device goes to which
> > namespace, as devices are not in a namespace.
> Why? The host certainly can have a policy for what devices go to which
> container.

But that's a userspace thing, if at all, not a kernel thing.

> And as such knows which events goes to which container.

Userspace might know this, sure, so implement a version of udevd that
does this all in userspace.

> The container *is* a set on namespace, and control groups.

But devices are not. They are global for the whole kernel.

> So a user program reads the events on the master, looks in a database
> and forwards it to that container. The uevents represent the device
> add/del so it seems natural that it should be the mechanism by which
> that communication happens. I just want to see it controlled by a
> policy on the host.

Then do so all in userspace, don't try to force namespaces on devices in
the kernel that do not have them at all. You are adding code that is
"pretending" that devices really are in namespaces, which is not true at
all.

> >> That master is the host, and to do that I want to
> >> forward events that the host receives to those individual containers.
> >> But since the kernel is broadcasting them, I can't have that policy on
> >> the host, and would have to filter on each container. Or I can do as
> >> you say and have the master forward events. I don't see this as putting
> >> the devices into a namespace, but rather managing devices from the
> >> outside and notifying the container of the event. Just like plugging in
> >> a monitor to the container.
> > But you can't "plug a monitor into a container". Nor can you "add a
> > keyboard to a container". Or a tty device. Or anything else (except
> > for network devices). Don't try to fake things out as that's not what
> > is happening here. The kernel shouldn't be allowing things to be sent
> > only to specific namespaces, as that's a lie, the devices are "global"
> > and not in a namespace at all.
> Again why? Why are network devices *different*?

They just are :)

Really, the layers behind a network device are set up for namespaces,
and multiple processes accessing the same device, and lots of other
things that no other device supports (hint, how do you access a network
device from userspace, and why does that look totally different from how
you access a disk or a tty device?)

If you want to do this type of work for all different device subsystems
in the kernel, great, please do so, creating some way to "share" the
hardware in virtual ways (hint, it's almost impossible to do, again
network devices are special, that's just the way that Unix treats
them...)

> They are a resources that is bound to the container, not to a
> namespace per se, but the container is a construct. A collection of
> namespaces, and cgroups. Again, I don't see why you can't add a
> keyboard to the container.

Because a keyboard is not a device that can work that way. It can't be
"shared". Yes, you can do multi-head systems, running with multiple
keyboards and mice and bind them all do different users, and that works
just fine, maybe you should look into that model for your container
work. But note, you need to do this with each device type, and how
would you handle /dev/mice? :)

Again, network devices are special, don't get hung up on trying to make
all other devices in the system work like a network device.

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/