Re: [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace

From: Yafang Shao
Date: Sat Apr 01 2023 - 12:33:05 EST


On Fri, Mar 31, 2023 at 1:52 PM Hao Luo <haoluo@xxxxxxxxxx> wrote:
>
> On Sun, Mar 26, 2023 at 2:22 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote:
> >
> <...>
> >
> > BPF namespace is introduced in this patchset with an attempt to remove
> > the CAP_SYS_ADMIN requirement. The user can create bpf map, prog and
> > link in a specific bpf namespace, then these bpf objects will not be
> > visible to the users in a different bpf namespace. But these bpf
> > objects are visible to its parent bpf namespace, so the sys admin can
> > still iterate and inspect them.
> >
> > BPF namespace is similar to PID namespace, and the bpf objects are
> > similar to tasks, so BPF namespace is very easy to understand. These
> > patchset only implements BPF namespace for bpf map, prog and link. In the
> > future we may extend it to other bpf objects like btf, bpffs and etc.
> > For example, we can allow some of the BTF objects to be used in
> > non-init bpf namespace, then the container user can only trace the
> > processes running in his container, but can't get the information of
> > tasks running in other containers.
> >
>
> Hi Yafang,
>
> Thanks for putting effort toward enabling BPF for container users!
>
> However, I think the cover letter can be improved. It's unclear to me
> what exactly is BPF namespace, what exactly it tries to achieve and
> what is its behavior. If you look at the manpage of pid namespace [1],
> cgroup namespace[2], and namespace[3], they all have a very precise
> definition, their goals and explain the intended behaviors well.
>

Thanks for your suggestion. The covetter should be improved. I will
read the man pages of these namespaces and improve it as you
suggested.

> I felt you intended the BPF namespace to provide isolation of object
> ids. That is, different views of the bpf object ids for different
> processes. This is like the PID namespace. But somehow, you also
> attach CAPs on top of that. That, I think, is not a namespace's job.
>

Agree with you that it should be independent of CAPs.
After the bpf namespace is introduced, actually we don't need the CAPs
when the user iterates IDs or converts IDs to FDs in his bpf namespace
(except in the init bpf namespace), because these are all readonly
operations and the user can only read the bpf objects created by
himself. While the CAPs should be required when the user wants to
write something, e.g. creating a map, loading a prog. They are really
different things. I will change it in the next version.

> Well, I could be wrong, but would appreciate you adding more details
> as follow-up.
>
> Hao
>
> [1] https://man7.org/linux/man-pages/man7/pid_namespaces.7.html
> [2] https://man7.org/linux/man-pages/man7/cgroup_namespaces.7.html
> [3] https://man7.org/linux/man-pages/man7/namespaces.7.html



--
Regards
Yafang