Re: [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace

From: Stanislav Fomichev
Date: Tue Mar 28 2023 - 13:15:24 EST


On 03/28, Yafang Shao wrote:
On Tue, Mar 28, 2023 at 1:28 AM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
>
> On 03/26, Yafang Shao wrote:
> > Currently only CAP_SYS_ADMIN can iterate BPF object IDs and convert IDs
> > to FDs, that's intended for BPF's security model[1]. Not only does it
> > prevent non-privilidged users from getting other users' bpf program, but
> > also it prevents the user from iterating his own bpf objects.
>
> > In container environment, some users want to run bpf programs in their
> > containers. These users can run their bpf programs under CAP_BPF and
> > some other specific CAPs, but they can't inspect their bpf programs in a
> > generic way. For example, the bpftool can't be used as it requires
> > CAP_SYS_ADMIN. That is very inconvenient.
>
> > Without CAP_SYS_ADMIN, the only way to get the information of a bpf object
> > which is not created by the process itself is with SCM_RIGHTS, that
> > requires each processes which created bpf object has to implement a unix
> > domain socket to share the fd of a bpf object between different
> > processes, that is really trivial and troublesome.
>
> > Hence we need a better mechanism to get bpf object info without
> > CAP_SYS_ADMIN.
>
> [..]
>
> > BPF namespace is introduced in this patchset with an attempt to remove
> > the CAP_SYS_ADMIN requirement. The user can create bpf map, prog and
> > link in a specific bpf namespace, then these bpf objects will not be
> > visible to the users in a different bpf namespace. But these bpf
> > objects are visible to its parent bpf namespace, so the sys admin can
> > still iterate and inspect them.
>
> Does it essentially mean unpriv bpf?

Right. With CAP_BPF and some other CAPs enabled.

> Can I, as a non-root, create
> a new bpf namespace and start loading/attaching progs?

No, you can't create a new bpf namespace as a non-root, see also
copy_namespaces().
In the container environment, new namespaces are always created by
containered, which is started by root.

Are you talking about "if (!ns_capable(user_ns, CAP_SYS_ADMIN))" part
from copy_namespaces? Isn't it trivially bypassed with a new user
namespace?

IIUC, I can create a new user namespace which gives me CAP_SYS_ADMIN
in this particular user-ns. Then I can go on and create a new bpf
namespace (with CAP_BPF) and go wild? I won't see anything from the
other namespaces, but I'll be able to load/attach bpf programs?

> Maybe add a paragraph about now vs whatever you're proposing.

What I'm proposing in this patchset is to put bpf objects (map, prog,
link, and btf) into the bpf namespace. Next step I will put bpffs into
the bpf namespace as well.
That said, I'm trying to put all the objects created in bpf into the
bpf namespace. Below is a simple paragraph to illustrate it.

Regarding the unpriv user with CAP_BPF enabled,
Now | Future
------------------------------------------------------------------------
Iterate his BPF IDs | N | Y |
Iterate others' BPF IDs | N | N |
Convert his BPF IDs to FDs | N | Y |
Convert others' BPF IDs to FDs | N | N |
Get others' object info from pinned file | Y(*) | N |
------------------------------------------------------------------------

(*) It can be improved by,
1). Different containers has different bpffs
2). Setting file permission
That's not perfect, for example, if one single user has two bpf
instances, but we don't want them to inspect each other.

I think the question here is what happens to the existing
capable(CAP_BPF) checks? Do they become ns_capable(CAP_BPF) eventually?

And if not, I don't think it integrates well with the user namespaces?

> Otherwise it's not very clear what's the security story.
> (haven't looked at the whole series, so maybe it's answered somewhere else?)
>


--
Regards
Yafang