Re: [PATCH v4 00/14] Add kdbus implementation

From: Andy Lutomirski
Date: Wed Mar 25 2015 - 14:12:35 EST


On Wed, Mar 25, 2015 at 10:29 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
> Hi
>
> On Tue, Mar 24, 2015 at 12:24 AM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>> On Mon, Mar 23, 2015 at 8:28 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
>>> On Thu, Mar 19, 2015 at 4:48 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>>>> On Thu, Mar 19, 2015 at 4:26 AM, David Herrmann <dh.herrmann@xxxxxxxxx> wrote:
>>>>> metadata handling is local to the connection that sends the message.
>>>>> It does not affect the overall performance of other bus operations in
>>>>> parallel.
>>>>
>>>> Sure it does if it writes to shared cachelines. Given that you're
>>>> incrementing refcounts, I'm reasonable sure that you're touching lots
>>>> of shared cachelines.
>>>
>>> Ok, sure, but it's still mostly local to the sending task. We take
>>> locks and ref-counts on the task-struct and mm, which is for most
>>> parts local to the CPU the task runs on. But this is inherent to
>>> accessing this kind of data, which is the fundamental difference in
>>> our views here, as seen below..
>>
>> You're also refcounting the struct cred
>
> No?
>
> We do ref-count the group-info, but that is actually redundant as we
> just copy the IDs. We should drop this, since group-info of 'current'
> can be accessed right away. I noted it down.

OK

>
>> and there's no good reason
>> for that to be local. (It might be a bit more local than intended
>> because of the absurd things that the key subsystem does to struct
>> cred, but IMO users should turn that off or the kernel should fix it.)
>>
>> Even more globally, I think you're touching init_user_ns's refcount in
>> most scenarios. That's about as global as it gets.
>
> get_user_ns() in metadata.c is a workaround (as the comment there
> explains). With better export-helpers for caps, we can simply drop it.
> It's conditional on KDBUS_ATTACH_CAPS, anyway.

Fair enough.

>
>> (Also, is there an easy benchmark to see how much time it takes to
>> send and receive metadata? I tried to get the kdbus test to do this,
>> and I failed. I probably did it wrong.)
>
> patch for out-of-tree kdbus:
> https://gist.github.com/dvdhrm/3ac4339bf94fadc13b98
>
> Update it to pass _KDBUS_ATTACH_ALL for both arguments of
> kdbus_conn_update_attach_flags().
>
>>>>> Furthermore, it's way faster than collecting the "same" data
>>>>> via /proc, so I don't think it slows down the overall transaction at
>>>>> all. If a receiver doesn't want metadata, it should not request it (by
>>>>> setting the receiver-metadata-mask). If a sender doesn't like the
>>>>> overhead, it should not send the metadata (by setting the
>>>>> sender-metadata-mask). Only if both peers set the metadata mask, it
>>>>> will be transmitted.
>>>>
>>>> But you're comparing to the wrong thing, IMO. Of course it's much
>>>> faster than /proc hackery, but it's probably much slower to do the
>>>> metadata operation once per message than to do it when you connect to
>>>> the endpoint. (Gah! It's a "bus" that could easily have tons of
>>>> users but a single "endpoint". I'm still not used to it.)
>>>
>>> Yes, of course your assumption is right if you compare against
>>> per-connection caches, instead of per-message metadata. But we do
>>> support _both_ use-cases, so we don't impose any policy.
>>> We still believe "live"-metadata is a crucial feature of kdbus,
>>> despite the known performance penalties.
> [...]
>> This is even more true if this feature
>> is *inconsistent* with legacy userspace (i.e. userspace dbus).
>
> Live metadata is already supported on UDS via SCM_CREDENTIALS, we just
> extend it to other metadata items. It's not a new invention by us.
> Debian code-search on SO_PASSCRED and SCM_CREDENTIALS gives lots of
> results.
>
> Netlink, as a major example of an existing bus API, already uses
> SCM_CREDENTIALS as primary way to transmit metadata.
>
>> I could be wrong about the lack of use cases. If so, please enlighten me.
>
> We have several dbus APIs that allow clients to register as a special
> handler/controller/etc. (eg., see systemd-logind TakeControl()). The
> API provider checks the privileges of a client on registration and
> then just tracks the client ID. This way, the client can be privileged
> when asking for special access, then drop privileges and still use the
> interface. You cannot re-connect in between, as the API provider
> tracks your bus ID. Without message-metadata, all your (other) calls
> on this bus would always be treated as privileged. We *really* want to
> avoid this.

Connect twice?

You *already* have to reconnect or connect twice because you have
per-connection metadata. That's part of my problem with this scheme
-- you support *both styles*, which seems like it'll give you most of
the downsides of both without the upsides.

>
> Another example is logging, where we want exact data at the time a
> message is logged. Otherwise, the data is useless.

Why?

No, really, why is exact data at the time of logging so important? It
sounds nice, but I really don't see it.

> With
> message-metadata, you can figure out the exact situation a process was
> in when a specific message was logged. Furthermore, it is impossible
> to read such data from /proc, as the process might already be dead.
> Which is a _real_ problem right now!
> Similarly, system monitoring wants message-metadata for the same
> reasons. And it needs to be reliable, you don't want malicious
> sandboxes to mess with your logs.

Huh? A "malicious sandbox" can always impersonate itself, whether by
connecting and handing off a connection or simply by relaying mesages.

>
> kdbus is a _bus_, not a p2p channel. Thus, a peer may talk to multiple
> destinations, and it may want to look different to each of them. DBus
> method-calls allow 'syscall'-ish behavior when calling into other
> processes. We *want* to be able to drop privileges after doing process
> setup. We want further bus-calls to no longer be treated privileged.

You could have an IOCTL that re-captures your connection metata.

>
> Furthermore, DBus was designed to allow peers to track other peers
> (which is why it always had the NameOwnerChanged signal). This is an
> essential feature, that simplifies access-management considerably, as
> you can cache it together with the unique name of a peer. We only open
> a single connection to a bus. glib, libdbus, efl, ell, qt, sd-bus, and
> others use cached bus-connections that are shared by all code of a
> single thread. Hence, the bus connection is kinda part of the process
> itself, like stdin/stdout. Without message-metadata, it is impossible
> to ever drop privileges on a bus, without losing all state.

See above about an IOCTL that re-captures your connection metadata.

Again, you seem to be arguing that per-connection metadata is bad, but
you still have an implementation of per-connection metadata, so you
still have all these problems.

I'm actually okay with per-message metadata in principle, but I'd like
to see evidence (with numbers, please) that a send+recv of per-message
metadata is *not* significantly slower than a recv of already-captured
per-connection metadata. If this is in fact the case, then maybe you
should trash per-connection metadata instead and the legacy
compatibility code can figure out a way to deal with it. IMO that
would be a pretty nice outcome, since you would never have to worry
whether your connection to the bus is inadvertantly privileged.

(Also, FWIW, it seems like what you really want is a capability model,
in which you grab a handle to some service and that handle captures
all your privileges wrt that service. Per-message metadata is even
farther from this than per-connection-to-the-bus metadata, but neither
one is particularly close.)

>
> Thanks
> David



--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/