Re: [GIT PULL] kdbus for 4.1-rc1

From: Havoc Pennington
Date: Wed Apr 15 2015 - 12:27:57 EST


Hi,

I'm temporarily joining the list if anyone has questions about why
dbus was originally the way it is. If you would like answers about its
latest usage, systemd, or the kernel implementation, those are best
answered by others.

I "led" the original design but I was hardly the only person involved.
I was sort of synthesizing previous efforts, lots of ideas from other
people, and mediating the politics of the time.

What I'd like to see in this conversation is: understanding what
exists, and why it exists.

If people understand that then I think they can make good decisions,
using whatever process or timeline you like; I don't pretend to know
much about kdbus, but I see a lot of confusion here about the use-case
and design of dbus itself.

No one should take the design on faith. To improve and maintain
something it must be understood.

Why should you bother to understand dbus as it exists? It's pretty
successful, and I think for a reason. Hundreds of programs are using
dbus, it's become (over a decade) foundational to the most-used Linux
userspaces, there are many different implementations of it, and it's
been quite a stable design over that time without any major changes. I
don't think that's because it's perfect; I do think it's because some
things are right, in ways that previous designs were not. The Linux
userspace community went through a lot of alternatives before dbus,
and dbus was the one that lasted.

The worst-case scenario in my mind would be for the kernel to merge
something dbus-like, but with ill-informed changes that render it
worse. Then you would have a new ABI that nobody wants to use. We have
a design in the wild that's been very successful. People using it for
its intended use-case seem to like it. Step 1 is to try to understand
why that is.

I will try to give my take on some of the reasons.

I can't emphasize enough that the success of dbus was *because of*
many "obvious" criticisms people may have. Why? Tradeoffs. Given
infinite time and resources, many of those tradeoffs can be mitigated
or avoided - and I see kdbus as part of an effort to do so.

The first and most important tradeoff: the central daemon (the hub in
the wheel). A central daemon has several disadvantages. The success of
dbus happened because those disadvantages, in this context, are not as
important as the advantages.

The advantages include:

* ability to send a broadcast message to all interested processes
* tracking/discovering well-known and unique names
* crossing security domains (system-daemon-to-per-user-UIs, in
particular) in an orderly fashion
* reducing the number of file descriptors needed for N apps to all
talk to each other
* relatively simple model for application developers to get right

The disadvantages include:

* performance (extra context switches, copies, and validations)
* it's difficult to handle killing/restarting the central daemon;
dbus actually gives clients all the tools to do this, but in practice
if you restart the daemon you are gambling that a hundred clients
connected to it have implemented bug-free restart handling.
* not a distributed cluster (it's a single bottleneck and point of
failure running on a single machine - the daemon is a source of truth,
which is also its virtue of course)

For dbus to be as useful as it has been, these disadvantages, while
not desirable, were acceptable tradeoffs. So it would be a mistake to
solve any of these disadvantages by breaking the advantages.

Message passing or IPC isn't really the most important part of dbus.
Process lifecycle tracking and discovery are more important. However,
by integrating the IPC system with the lifecycle tracking you can
simplify the overall system and avoid race conditions. For example,
you can have processes that auto-launch race-free when you send them a
message, or more generally you can have an ordering between lifecycle
events and other messages. For example if I send out a broadcast
message and then disconnect, other clients will see first the
broadcast and then the disconnect and won't have to handle the
out-of-order case.

dbus has a lot of semantic guarantees, such as message ordering, that
reduce application complexity and therefore reduce code and reduce
bugs.

When implementing a Linux workstation userspace, ideally you have lots
of little processes that do one thing each; but the tradeoff is that
multi-process adds complexity. If your model for a multi-process
program is that it has to solve a lot of hard distributed system
problems, then it adds a LOT of complexity. But when everyone's on a
single machine, it is not necessary to solve (all of) those problems,
and in fact trying to solve non-problems creates bugs by adding
tricky, rarely-touched codepaths. It is overengineering to treat "tray
icon talking to NetworkManager" the same way you would treat IPC and
shared state within a distributed cluster.

Multi-process is valuable though; an alternative userspace design
could be like Eclipse or Emacs, i.e. one enormous process with
plugins, which would be a mess.

There was some debate over my X11 analogy. One of the "thought
experiments" while figuring out dbus was "why does CORBA seem to be at
the root of endless bug reports, while X11 isn't?"

Here are some things I think dbus has in common with X11:

* it's a hub-and-spoke design (a central server that all apps connect
to) rather than a design where every process talks directly to every
other process
* dbus names are directly modeled on X selections (see ICCCM)
* designed to allow race-free asynchronous usage and minimize the
need for round trips (though apps can certainly design bad APIs, see
http://dbus.freedesktop.org/doc/dbus-api-design.html for advice on
avoiding that)
* binary protocol rather than text
* generally assumes a reliable network - assumes all messages will
arrive, as long as the connection is live
* similar model for discovering and authenticating to the server
* allows clients to track each other's lifecycle
* it is stateful; clients connect, fetch the current state, then
track changes to the state via events.

Some differences from X11 of course:

* X11 is a domain-specific server (about sharing the graphics and
input hardware among multiple clients), while with dbus the
domain-specific API will be in some client and the bus is only an
intermediary.
* X11 therefore has a bunch more server state than dbus; dbus only
has to track clients, not track the state of the window system.
* IPC on X11 is sort of bolted on in an ugly way (client messages)
while dbus cleanly maps to the OO model people are used to in the rest
of their code.

Havoc
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/