Re: [PATCH net-next 3/4] bpf: add support for persistent maps/progs

From: Alexei Starovoitov
Date: Wed Oct 21 2015 - 18:44:46 EST


On 10/21/15 11:34 AM, Thomas Graf wrote:
So far, during this discussion, it was proposed to modify the file system
>to a single-mount one and to stick this under/sys/kernel/bpf/. This
>will not have "real" namespace support either, but it was proposed to
>have a following structure:
>
> /sys/kernel/bpf/username/<optional_dirs_mkdir_by_user>/progX
This would probably work as you would typically map the ebpf map
using -v like this to give a stable path:

docker run -v /sys/kernel/bpf/foo/maps/progX:/map proX

yep
tracefs works inside docker the same way.
May be we should let users pick names similar to this fs patch to make
the above easier to use.
Also from bpf syscall point of the user shouldn't see
/sys/kernel/bpf/user/ prefix. Only 'optional_dirs_mkdir_by_user/name'
when doing pin/new_fd.
May be prog type should be a fixed part of the path as well.

>Together with device cgroups for containers, it would allow scenarios where
>you can have:
>
> * eBPF (map/prog) device pass-through so a map/prog could even be shared out
> from the initial namespace into individual ones/all (one could possibly
> extend such maps as read-only for these consumers).
> * eBPF device creation for unprivileged users with permissions being set
> accordingly (as in fs case).
> * Since cgroup controller can also do wildcards on major/minors, we could
> make that further fine-grained.
> * eBPF device creation can also be enforced by the cgroup controller to be
> entirely disallowed for a specific container.

none of the above is practical. It can be demoed in a canned
environment, but it's a complete mismatch of apis. cgroup/dev is a
static config, whereas bpf-cdev is dynamic (with minors out of idr for
all users) When you have to hack drivers/base/core.c to get there it
should have been a warning sign that something is wrong with
this cdev approach.

I've read the discussion passively and my take away is that, frankly,
I think the differences are somewhat minor. Both architectures can
scale to what we need. Both will do the job. I'm slightly worried about
exposing uAPI as a FS, I think that didn't work too well for sysfs. It's
pretty much a define the format once and never touch it again kind of
deal.

It's even worse in cdev style since it piggy backs on sysfs.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/