Re: [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace

From: Yafang Shao
Date: Tue Mar 28 2023 - 23:03:11 EST


On Wed, Mar 29, 2023 at 1:15 AM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
>
> On 03/28, Yafang Shao wrote:
> > On Tue, Mar 28, 2023 at 1:28 AM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
> > >
> > > On 03/26, Yafang Shao wrote:
> > > > Currently only CAP_SYS_ADMIN can iterate BPF object IDs and convert
> > IDs
> > > > to FDs, that's intended for BPF's security model[1]. Not only does it
> > > > prevent non-privilidged users from getting other users' bpf program,
> > but
> > > > also it prevents the user from iterating his own bpf objects.
> > >
> > > > In container environment, some users want to run bpf programs in their
> > > > containers. These users can run their bpf programs under CAP_BPF and
> > > > some other specific CAPs, but they can't inspect their bpf programs
> > in a
> > > > generic way. For example, the bpftool can't be used as it requires
> > > > CAP_SYS_ADMIN. That is very inconvenient.
> > >
> > > > Without CAP_SYS_ADMIN, the only way to get the information of a bpf
> > object
> > > > which is not created by the process itself is with SCM_RIGHTS, that
> > > > requires each processes which created bpf object has to implement a
> > unix
> > > > domain socket to share the fd of a bpf object between different
> > > > processes, that is really trivial and troublesome.
> > >
> > > > Hence we need a better mechanism to get bpf object info without
> > > > CAP_SYS_ADMIN.
> > >
> > > [..]
> > >
> > > > BPF namespace is introduced in this patchset with an attempt to remove
> > > > the CAP_SYS_ADMIN requirement. The user can create bpf map, prog and
> > > > link in a specific bpf namespace, then these bpf objects will not be
> > > > visible to the users in a different bpf namespace. But these bpf
> > > > objects are visible to its parent bpf namespace, so the sys admin can
> > > > still iterate and inspect them.
> > >
> > > Does it essentially mean unpriv bpf?
>
> > Right. With CAP_BPF and some other CAPs enabled.
>
> > > Can I, as a non-root, create
> > > a new bpf namespace and start loading/attaching progs?
>
> > No, you can't create a new bpf namespace as a non-root, see also
> > copy_namespaces().
> > In the container environment, new namespaces are always created by
> > containered, which is started by root.
>
> Are you talking about "if (!ns_capable(user_ns, CAP_SYS_ADMIN))" part
> from copy_namespaces? Isn't it trivially bypassed with a new user
> namespace?
>
> IIUC, I can create a new user namespace which gives me CAP_SYS_ADMIN
> in this particular user-ns. Then I can go on and create a new bpf
> namespace (with CAP_BPF) and go wild? I won't see anything from the
> other namespaces, but I'll be able to load/attach bpf programs?
>

I don't think so. If you create a new userspace, and give the process
the CAP_BPF or CAP_SYS_ADMIN in this new user namespace but not the
initial namespace, you can't do that. Because currently only CAP_BPF
or CAP_SYS_ADMIN in the init user namespace can load/attach bpf
programs.

> > > Maybe add a paragraph about now vs whatever you're proposing.
>
> > What I'm proposing in this patchset is to put bpf objects (map, prog,
> > link, and btf) into the bpf namespace. Next step I will put bpffs into
> > the bpf namespace as well.
> > That said, I'm trying to put all the objects created in bpf into the
> > bpf namespace. Below is a simple paragraph to illustrate it.
>
> > Regarding the unpriv user with CAP_BPF enabled,
> > Now | Future
> > ------------------------------------------------------------------------
> > Iterate his BPF IDs | N | Y |
> > Iterate others' BPF IDs | N | N |
> > Convert his BPF IDs to FDs | N | Y |
> > Convert others' BPF IDs to FDs | N | N |
> > Get others' object info from pinned file | Y(*) | N |
> > ------------------------------------------------------------------------
>
> > (*) It can be improved by,
> > 1). Different containers has different bpffs
> > 2). Setting file permission
> > That's not perfect, for example, if one single user has two bpf
> > instances, but we don't want them to inspect each other.
>
> I think the question here is what happens to the existing
> capable(CAP_BPF) checks? Do they become ns_capable(CAP_BPF) eventually?
>

They won't become ns_capable(CAP_BPF). If it becomes
ns_capable(CAP_BPF), it will really go wild then.

> And if not, I don't think it integrates well with the user namespaces?
>

IIUC, it is the CAP_BPF which doesn't integrate with the user
namespaces, right?

--
Regards
Yafang