Re: Potential issues (security and otherwise) with the current cgroup-bpf API

From: Andy Lutomirski
Date: Mon Dec 19 2016 - 20:56:51 EST


On Mon, Dec 19, 2016 at 5:44 PM, David Ahern <dsahern@xxxxxxxxx> wrote:
> On 12/19/16 5:25 PM, Andy Lutomirski wrote:
>> net.socket_create_filter = "none": no filter
>> net.socket_create_filter = "bpf:baadf00d": bpf filter
>> net.socket_create_filter = "disallow": no sockets created period
>> net.socket_create_filter = "iptables:foobar": some iptables thingy
>> net.socket_create_filter = "nft:blahblahblah": some nft thingy
>> net.socket_create_filter = "address_family_list:1,2,3": allow AF 1, 2, and 3
>
> Such a scheme works for the socket create filter b/c it is a very simple use case. It does not work for the ingress and egress which allow generic bpf filters.

Can you elaborate on what goes wrong? (Obviously the
"address_family_list" example makes no sense in that context.)

>
> ...
>
>>> you're ignoring use cases I described earlier.
>>> In vrf case there is only one ifindex it needs to bind to.
>>
>> I'm totally lost. Can you explain what this has to do with the cgroup
>> hierarchy?
>
> I think the point is that a group hierarchy makes no sense for the VRF use case. What I put into iproute2 is
>
> cgrp2/vrf/NAME
>
> where NAME is the vrf name. The filter added to it binds ipv4 and ipv6 sockets to a specific device index. cgrp2/vrf is the "default" vrf and does not have a filter. A user can certainly add another layer cgrp2/vrf/NAME/NAME2 but it provides no value since VRF in a VRF does not make sense.

I tend to agree. I still think that the mechanism as it stands is
broken in other respects and should be fixed before it goes live. I
have no desire to cause problems for the vrf use case.

But keep in mind that the vrf use case is, in Linus' tree, a bit
broken right now in its interactions with other users of the same
mechanism. Suppose I create a container and want to trace all of its
created sockets. I'll set up cgrp2/container and load my tracer as a
socket creation hook. Then a container sets up
cgrp2/container/vrf/NAME (using delgation) and loads your vrf binding
filter. Now the tracing stops working -- oops.

>
> ...
>
>>>> I like this last one, but IT'S NOT A POSSIBLE FUTURE EXTENSION. You
>>>> have to do it now (or disable the feature for 4.10). This is why I'm
>>>> bringing this whole thing up now.
>>>
>>> We don't have to touch user visible api here, so extensions are fine.
>>
>> Huh? My example in the original email attaches a program in a
>> sub-hierarchy. Are you saying that 4.11 could make that example stop
>> working?
>
> Are you suggesting sub-cgroups should not be allowed to override the filter of a parent cgroup?

Yes, exactly. I think there are two sensible behaviors:

a) sub-cgroups cannot have a filter at all of the parent has a filter.
(This is the "punt" approach -- it lets different semantics be
assigned later without breaking userspace.)

b) sub-cgroups can have a filter if a parent does, too. The semantics
are that the sub-cgroup filter runs first and all side-effects occur.
If that filter says "reject" then ancestor filters are skipped. If
that filter says "accept", then the ancestor filter is run and its
side-effects happen as well. (And so on, all the way up to the root.)

--Andy