Re: [GIT PULL] kdbus for 4.1-rc1

From: Andy Lutomirski
Date: Tue Apr 28 2015 - 16:42:38 EST

On Tue, Apr 28, 2015 at 1:34 PM, David Lang <david@xxxxxxx> wrote:
> On Tue, 28 Apr 2015, Havoc Pennington wrote:
>> On Tue, Apr 28, 2015 at 1:19 PM, David Lang <david@xxxxxxx> wrote:
>>> If the examples that are being used to show the performance advantage of
>>> kdbus vs normal dbus are doing the wrong thing, then we need to get some
>>> other examples available to people who don't live and breath dbus that
>>> 'so
>>> things right' so that the kernel developers can see what you think is the
>>> real problem and how kdbus addresses it.
>>> So far, this 'wrong' example is the only thing that's been posted to show
>>> the performance advantage of kdbus.
>> I'm hopeful someone will do that.
>> fwiw, I would be suspicious of a broken benchmark if it didn't show:
>> * the bus daemon means an extra read/parse and marshal/write per
>> message, so 4 vs. 2
>> * the existence of the bus daemon therefore makes a message
>> send/receive take roughly twice as long
>> has a bit more elaboration about
>> number of copies, validations, and context switches in each case.
>> From what I can tell, the core performance claim for kdbus is that for
>> a userspace daemon to be a routing intermediary, it has to receive and
>> re-send messages. If the baseline performance of IPC is the cost to
>> send once and receive once, adding the daemon means there's twice as
>> much to do (1 more receive, 1 more send). However fast you make
>> send/receive, the daemon always means there are twice as many
>> send/receives as there would be with no daemon.
> there are twice as many context switches, nobody disputes that, the question
> is if it matters.
> It doesn't matter if the message router is in kernel space or user space, it
> still needs to read/parse, marshal/write the data, so you aren't saving that
> time due to it being in the kernel.
>> If that isn't what a benchmark shows, then there's a mystery to
>> explain... (one disruption to the ratio of course could be if the
>> clients use a much faster or slower dbus lib than the daemon)
>> As noted many times, of course this 2x penalty for the daemon was a
>> conscious tradeoff - kdbus is trying to escape the tradeoff in order
>> to extend usage of dbus to more use cases. Given the tradeoff,
>> _existing_ uses of dbus seem to prefer the performance hit to the loss
>> of useful semantics, but potential new users would like to or need to
>> have both.
> If there is a 2x performance improvement for being in the kernel, but a 100x
> performance improvement from fixing the userspace code, the effort should be
> spent on the userspace code, not on moving things to kernel space.

I would guess that, if we compared a highly optimized userspace
implementation to a kernel implementation, we'd see less than 2x
difference. After all, a userspace daemon doesn't really need to
unmarshal and re-marshal anything except headers. For large messages,
we could use splice and avoid a couple of copies, too.

If the scheduler became a bottleneck, it could be interesting to add
something like a send-and-poll primitive. I suspect that some
workloads currently do unnecessary context switches with only standard
POSIX primitives. If A sends a message to B, then there's a brief
window in which both A and B are runnable. Ideally we wouldn't
context switch until A calls poll or epoll_wait, but I don't know how
well that works in practice.

There's more room for generic improvements than just that. At LSF/MM
we were talking about more scalable epoll variants that would allow a
multithreaded daemon to be woken up on the core that received incoming
data. That would allow an efficient multi-queue dbus with fewer
migrations and IPIs.

At some point, I'd like to implement PCID on x86 (if no one beats me
to it, and this is a low priority for me), which will allow us to skip
expensive TLB flushes while context switching. I have no idea whether
ARM can do something similar.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at