Re: kdbus: to merge or not to merge?
From: David Herrmann
Date: Wed Jul 01 2015 - 12:51:55 EST
Hi
On Wed, Jul 1, 2015 at 2:03 AM, Kalle A. Sandstrom <ksandstr@xxxxxx> wrote:
> For the first, compare unix domain sockets (i.e. point-to-point mode, access
> control through filesystem [or fork() parentage], read/write/select) to the
> kdbus message-sending ioctl. In the main data-exchanging portion, the former
> requires only a connection identifier, a pointer to a buffer, and the length
> of data in that buffer. To contrast, kdbus takes a complex message-sending
> command structure with 0..n items of m kinds that the ioctl must parse in a
> m-way switching loop, and then another complex message-describing structure
> which has its own 1..n items of another m kinds describing its contents,
> destination-lookup options, negotiation of supported options, and so forth.
sendmsg(2) uses a very similar payload to kdbus. send(2) is a shortcut
to simplify the most common use-case. I'd be more than glad to accept
patches adding such shortcuts to kdbus, if accompanied by benchmark
numbers and reasoning why this is a common path for dbus/etc. clients.
The kdbus API is kept generic and extendable, while trying to keep
runtime overhead minimal. If this overhead turns out to be a
significant runtime slowdown (which none of my benchmarks showed), we
should consider adding shortcuts. Until then, I prefer an API that is
consistent, easy to extend and flexible.
> Consequently, a carefully optimized implementation of unix domain sockets (and
> by extension all the data-carrying SysV etc. IPC primitives, optimized
> similarly) will always be superior to kdbus for both message throughput and
> latency, [...]
Yes, that's due to the point-to-point nature of UDS.
> [...] For long messages (> L1 cache size per Stetson-Harrison[0]) the
> only performance benefit from kdbus is its claimed single-copy mode of
> operation-- an equivalent to which could be had with ye olde sockets by copying
> data from the writer directly into the reader while one of them blocks[1] in
> the appropriate syscall. That the current Linux pipes, SysV queues, unix domain
> sockets, etc. don't do this doesn't really factor in.
Parts of the network subsystem have supported single-copy (mmap'ed IO)
for quite some time. kdbus mandates it, but otherwise is not special
in that regard.
> A consequence of this buffering is that whenever a client sends a message with
> kdbus, it must be prepared to handle an out-of-space non-delivery status.
> [...] There's no option to e.g. overwrite a previous
> message, or to discard queued messages in an oldest-first order, instead of
> rebuffing the sender.
Correct.
> For broadcast messaging, a recipient may observe that messages were dropped by
> looking at a `dropped_msgs' field delivered (and then reset) as part of the
> message reception ioctl. Its value is the number of messages dropped since last
> read, so arguably a client could achieve the equivalent of the condition's
> absence by resynchronizing explicitly with all signal-senders on its current
> bus wrt which it knows the protocol, when the value is >0. This method could in
> principle apply to 1-to-1 unidirectional messaging as well[2].
Correct.
> Looking at the kdbus "send message, wait for tagged reply" feature in
> conjunction with these details appears to reveal two holes in its state graph.
> The first is that if replies are delivered through the requestor's buffer,
> concurrent sends into that same buffer may cause it to become full (or the
> queue to grow too long, w/e) before the service gets a chance to reply. If this
> condition causes a reply to fall out of the IPC flow, the requestor will hang
> until either its specified timeout happens or it gets interrupted by a signal.
If sending a reply fails, the kdbus_reply state is destructed and the
caller must be woken up. We do that for sync-calls just fine, but the
async case does indeed lack a wake-up in the error path. I noted this
down and will fix it.
> If replies are delivered outside the shm pool, the requestor must be prepared
> to pick them up using a different means from the "in your pool w/ offset X,
> length Y" format the main-line kdbus interface provides. [...]
Replies are never delivered outside the shm pool.
> The second problem is that given how there can be a timeout or interrupt on the
> receive side of a "method call" transaction, it's possible for the requestor to
> bow out of the IPC flow _while the service is processing its request_. This
> results either in the reply message being lost, or its ending up in the
> requestor's buffer to appear in a loop where it may not be expected. Either
(for completeness: we properly support resuming interrupted sync-calls)
> way, the client must at that point resynchronize wrt all objects related to the
> request's side effects, or abandon the IPC flow entirely and start over.
> (services need only confirm their replies before effecting e.g. a chardev-like
> "destructively read N bytes from buffer" operation's outcome, which is slightly
> less ugly.)
Correct. If you time-out, or refuse to resume, a sync-call, you have
to treat this transaction as failed.
> Tying this back into the first point: to prevent this type of denial-of-service
> against sanguinely-written software it's necessary for kdbus to invoke the
> policy engine to determine that an unrelated participant isn't allowed to
> consume a peer's buffer space.
It's not the policy engine, but quota-handling, but otherwise correct.
Thanks
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/