Re: [GIT PULL] kdbus for 4.1-rc1
From: James Bottomley
Date: Fri Apr 17 2015 - 15:27:48 EST
On Thu, 2015-04-16 at 14:13 +0200, David Herrmann wrote:
> Hi
>
> On Wed, Apr 15, 2015 at 8:12 PM, James Bottomley
> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > For me the biggest issue is the container problem: it's really hard to
> > containerise kdbus because of the stateful nature of the protocol and
> > the fact that it has a well known system bus. Separation into domains
> > works for OS containers, but application containers need more fluidity.
> > It's not unlike the same problem on windows: Windows application
> > containers are very difficult to do because the global registry means
> > that OLE handlers all have to run inside your container as well
> > (effectively making it an OS container). I'm sure, since we already
> > have a lot of containers people going to plumbers, that we can get them
> > to turn up for the discussion.
>
> kdbus actually works very well in OS containers that mount a new
> kdbusfs inside the container. This new instance of kdbus will be
> entirely seperated from any other on the system. We've designed it
> that way especially with OS containers in mind. This is explained in
> kdbus.fs(7). It's very similar to devpts' container support, where you
> mount a new instance of devpts into each container instance you run.
>
> For Docker-style (i.e. app-focused) containers, it's a more complex
> story.
Well, no, docker-style is just one flavour of application containers.
I'm actually much more interested in something very different:
applications that use container features (like docker, rocket and
systemd). Facilitating them is an interesting exercise.
Also, applications inside containers were around long before docker in
the PaaS space at least.
> kdbus will not solve this for you, but at least one thing
> deserves being mentioned: for this kind of sandboxing kdbus certainly
> makes things *easier*, compared to dbus1.
So slightly better than really difficult isn't terribly useful.
> Why? because the kernel
> gains a notion of individual messages and method call transactions,
> something that is completely unavailable if you stick to dbus1 where
> all the kernel sees is a raw stream of AF_UNIX/SOCK_STREAM bytes. In
> fact, kdbus as it is right now even contains minimal but explicit
> support for sandboxing, by allowing creation of multiple bus endpoints
> to the same bus that carry additional, more restrictive policy.
Sandboxing is a minor (albeit very useful) use of containers.
You nicely ignored the actual problem I listed, which is the system bus.
And the specific example of what happens. Let me try again. Just to
provide the context, Virtuozzo has long supported containers on both
Windows and Linux. We have been doing application containers on Linux
for a long time, but we've been having issues doing the same thing on
windows (in spite of the fact that our windows container system is very
similar to the Linux one).
In windows, OLE + the global registry is dbus on steroids. The idea
seems simple and elegant: remote system elements are provided to you via
an IPC interaction instead of being directly dynamically linked into
your virtual address space. It allows windows applications to deal with
arbitrary objects of unknown type because the type handlers are provided
by the system via OLE. It's really elegant in a single user desktop
environment because the system's job is to serve and protect only that
user. In a multi user environment (as MS found with VDI) it's a lot
more problematic because now either the type handlers are global
(meaning local users can't modify them unlike in the single user case)
or they're all local, meaning we're back to OS containers again. If you
think abstractly of containers as a way to bring multi-user features to
single user environments (essentially that's what OS virtualization is)
you can see immediately why we're having such issues with non-os
containers on Windows because the single bus/global namespace idea
doesn't play well with multi-user.
This is why I think kdbus is a bad idea: it solidifies as a linux kernel
API something which runs counter to granular OS virtualization (and
something which caused Windows to fall behind Linux in the container
space). Splitting out the acceleration problem and leaving the rest to
user space currently looks fine because the ideas Al and Andy are
kicking around don't cause problems with OS virtualization.
James
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/