Re: [PATCH 01/13] kdbus: add documentation

From: David Herrmann
Date: Wed Feb 04 2015 - 19:16:43 EST


Hi

On Thu, Feb 5, 2015 at 12:03 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> I see "latencies" of around 20 microseconds with lockdep and context
> tracking off. For example:

Without metadata nor memfd transmission, I get 2.5us for kdbus, 1.5us
for UDS (8k payload). With 8-byte payloads, I get 2.2us and 1.2us. I
suspect you enabled metadata transmission, which I think is not a fair
comparison.

A few notes on that:

* kdbus is a bus layer. We don't intend to replace UDS, but improve
dbus. Comparing roundtrip times with UDS is tempting, but in no way
fair. To the very least, a bus layer has to perform peer-lookup, which
UDS does not have to do. Imo, 2.5us vs. 1.5us is already pretty nice.
Compare this to ~77us for dbus1 without marshaling.

* We have not optimized kdbus code-paths for speed, yet. Our main
concerns are algorithmic challenges, and we believe they've been
improved considerably with kdbus. I have constantly measured kdbus
performance with 'perf' and flame-graphs, and there're a lot of
possible optimizations (especially on locking). However, I think this
can be done afterwards just fine. Neither API nor ioctl overhead has
shown up in my measurements. If anyone has counter evidence, please
let us know. But I'm a bit reluctant to change our API solely based on
performance guesses.

* We're about 50% slower than UDS on 1-byte transmissions. With 32k
we're on-par. How can a lightweight user-space daemon even get close
to that?

* Broadcast performance is a completely different story. SEND gets
around 30% faster compared to kdbus unicasts (as most of the
control-paths are only taken once per message, instead of once per
destination).

* test-benchmark.c does performance tests in a single process. If the
bus-layer is implemented in user-space, you need to account for
context-switches and task wakeups. My UDS and pipe round-trip latency
tests got around 3x slower if done cross processes (3.7us instead of
1.2us). With a user-space daemon, those slow-downs are taken two times
more often for each roundtrip.

* Process time is accounted on the sender, instead of a shared process
(dbus-daemon). Broadcasts will thus no longer consume time-slices of
dbus-daemon, but only the sender's.


With kdbus, we implement a bus-layer. This is our only target! If your
target environment does not require a bus, then don't use kdbus. We
don't intend to replace UDS. On a bus-layer, we need peer-discovery,
policy-handling, destination-lookups, broadcast-management and more.
Pipes/UDS do not provide any of this.
I cannot see how any other existing bus-implementation comes even
close to kdbus, performance-wise. If someone does, please let us know!

Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/