Re: [GIT PULL] kdbus for 4.1-rc1

From: Daniel Mack
Date: Fri Apr 17 2015 - 09:24:10 EST


Hi Havoc,

On 04/16/2015 09:01 PM, Havoc Pennington wrote:
> On Thu, Apr 16, 2015 at 9:13 AM, Tom Gundersen <teg@xxxxxxx> wrote:
>> All types of messages (unicast and broadcast) are directly stored into
>> a pool slice of the receiving connection, and this slice is not reused
>> by the kernel until userspace is finished with it and frees it. Hence,
>> a client which doesn't process its incoming messages will, at some
>> point, run out of pool space. If that happens for unicast messages,
>> the sender will get an EXFULL error. If it happens for a multicast
>> message, all we can do is drop the message, and tell the receiver how
>> many messages have been lost when it issues KDBUS_CMD_RECV the next
>> time. There's more on that in kdbus.message(7).
>
> Have you guys already grappled with what libraries/apps should do with
> this information?
>
> To handle the knowledge that "N messages have been lost," it seems
> like the client must answer "are there any messages that, if lost,
> would put any code using this connection into a confused state" and
> then the client has to recover from said confused state.

This can only happen with user-originated DBus signal messages. For
unicast messages such as method calls, the sender will actually see
-EXFULL, and no part of the message is transmitted, leaving neither side
in a confused state. But yes, for broadcast signal messages, we can't
reject the sender because one single peer is out of buffer space, and we
can't allow boundless allocations on the receiver either, so informing
the other side is the best we can do.

Note that dbus-daemon just drops such signals silently. So with this
counter we simply add a debug mechanism for now. There hasn't been a
consensus on how to react to such errors on the application level. The
easiest way is obviously to re-sync all your state with the peer (which
could be as easy as calling ObjectManager.GetManagedObjects() or
Properties.GetAll()).

> A library probably can't do this - it doesn't know what state matters
> or how to recover it - so each app would have to... and are
> connections ever shared between modules of an app? (for example: could
> a library such as GTK+ or pulseaudio be using the connection, and then
> application code is also using the connection, so none of those code
> modules has the whole picture... at that point, none of the modules
> knows what to do about lost messages... to try to handle lost messages
> in a module, you'd need a private connection(?)... which might be fine
> as long as each app having a number of connections isn't too bloated.)
>
> How to handle a send error depends a lot on what's being sent... but
> if I were writing a general-purpose library wrapper, I'd be very
> tempted to hide EXFULL behind an unbounded (or very-high-bounded)
> userspace send buffer, which of course is what you were trying to
> avoid, but I am skeptical that the average app will handle this error
> sensibly.

Actually, we see no real difference between constrained outgoing or
incoming buffers. Even with a very-high-bounded send-buffer, you still
need to deal with it running full.

> The traditional userspace bus isn't any better than what you've
> described here, of course - it's even worse - and it works well
> enough. The limits are simply set high enough that they won't be hit
> unless someone's broken or evil. Which is also the traditional
> approach to say file descriptor limits or swap space: set the limit
> high and hope you won't reach it. For the case of the X server, the
> limit on message buffers appears to be "until malloc fails," so they
> have the limit quite high, higher than userspace dbus does. "set high
> limits and don't hit them" is a tried-and-true approach.
>
> With either the existing userspace bus or kdbus, I bet you could come
> up with ways to use limit exhaustion to get various services and apps
> into confused states as they miss messages they were relying on,
> simply because this is too hard for apps to reliably get right. The
> lower the limits, the easier it would be to cause trouble by forcing
> them to be hit.
>
> In a perfect world we could figure out which client is "at fault" for
> filling a buffer - the slow receiver or the overzealous sender - so we
> could throttle or disconnect the guilty party instead of throwing
> errors that won't be handled well ... but not sure that's practical.

Exactly, you need heuristics for that. It's non-trivial to figure out
whether the receiver or sender is to blame.

We've thought about how to address that for a while and came up with a
quota logic that is similar to what dbus-daemon implements in order to
prevent single connections from overflowing the pool of a receiver. The
limits that apply to that are currently hard-coded, and they work well
on our systems. In the future, they can easily be made a bus-wide
property that can be configured at bus creation time.


Thanks,
Daniel

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/