Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

From: Daniel Borkmann
Date: Tue Oct 06 2015 - 04:40:10 EST


On 10/06/2015 10:20 AM, Ingo Molnar wrote:

* Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:

On 10/06/2015 09:13 AM, Ingo Molnar wrote:

* Alexei Starovoitov <ast@xxxxxxxxxxxx> wrote:

On 10/5/15 3:14 PM, Daniel Borkmann wrote:
One scenario that comes to mind ... what happens when there are kernel
pointers stored in skb->cb[] (either from the current layer or an old
one from a different layer that the skb went through previously, but
which did not get overwritten)?

Socket filters could read a portion of skb->cb[] also when unprived and
leak that out through maps. I think the verifier doesn't catch that,
right?

grrr. indeed. previous layer before sk_filter() can leave junk in there.

Could this be solved by activating zeroing/sanitizing of this data if there's an
active BPF function around that can access that socket?

I think this check could only be done in sk_filter() for testing these
conditions (unprivileged user + access to cb area), so it would need to
happen from outside a native eBPF program. :/

Yes, the kernel (with code running outside of any eBPF program) would guarantee
that those data fields are zeroed/sanitized, if there's an eBPF program that is
attached to that socket.

[...] Also classic BPF would then need to test for it, since a socket filter
doesn't really know whether native eBPF is loaded there or a classic-to-eBPF
transformed one, and classic never makes use of this. Anyway, it could be done
by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF
verification phase, and test it inside sk_filter() if I see it correctly.

That could also be done in an unlikely() branch, to keep the cost to the non-eBPF
case near zero.

Yes, agreed. For the time being, the majority of users are coming from the
classic BPF side anyway and the unlikely() could still be changed later on
if it should not be the case anymore. The flag and bpf_func would share the
same cacheline as well.

The reason is that this sanitizing must only be done in the 'top-level' program
that is run from sk_filter() _directly_, because a user at any time could decide
to put an already loaded eBPF fd into a tail call map. And cb[] is then used to
pass args/state around between two programs, thus it cannot be unconditionally
cleared from within the program. The association to a socket filter
(SO_ATTACH_BPF) happens at a later time after a native eBPF program has already
been loaded via bpf(2).

So zeroing tends to be very cheap and it could also be beneficial to performance
in terms of bringing the cacheline into the CPU cache. But I really don't know the
filter code so I'm just handwaving.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/