Re: [PATCH bpf-next v2 3/6] bpf: Introduce process open coded iterator kfuncs

From: Andrii Nakryiko
Date: Fri Sep 15 2023 - 16:38:37 EST


On Fri, Sep 15, 2023 at 8:03 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
>
>
>
> 在 2023/9/15 07:26, Andrii Nakryiko 写道:
> > On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
> >>
> >> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
> >> creation and manipulation of struct bpf_iter_process in open-coded iterator
> >> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> >> iterate all processes in the system.
> >>
> >> Signed-off-by: Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx>
> >> ---
> >> include/uapi/linux/bpf.h | 4 ++++
> >> kernel/bpf/helpers.c | 3 +++
> >> kernel/bpf/task_iter.c | 29 +++++++++++++++++++++++++++++
> >> tools/include/uapi/linux/bpf.h | 4 ++++
> >> tools/lib/bpf/bpf_helpers.h | 5 +++++
> >> 5 files changed, 45 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index de02c0971428..befa55b52e29 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -7322,4 +7322,8 @@ struct bpf_iter_css_task {
> >> __u64 __opaque[1];
> >> } __attribute__((aligned(8)));
> >>
> >> +struct bpf_iter_process {
> >> + __u64 __opaque[1];
> >> +} __attribute__((aligned(8)));
> >> +
> >> #endif /* _UAPI__LINUX_BPF_H__ */
> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >> index d6a16becfbb9..9b7d2c6f99d1 100644
> >> --- a/kernel/bpf/helpers.c
> >> +++ b/kernel/bpf/helpers.c
> >> @@ -2507,6 +2507,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> >> BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
> >> BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> >> BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> >> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> >> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> >> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> >> index d8539cc05ffd..9d1927dc3a06 100644
> >> --- a/kernel/bpf/task_iter.c
> >> +++ b/kernel/bpf/task_iter.c
> >> @@ -851,6 +851,35 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> >> kfree(kit->css_it);
> >> }
> >>
> >> +struct bpf_iter_process_kern {
> >> + struct task_struct *tsk;
> >> +} __attribute__((aligned(8)));
> >> +
> >
> > Few high level thoughts. I think it would be good to follow
> > SEC("iter/task") naming and approach. Open-coded iterators in many
> > ways are in-kernel counterpart to iterator programs, so keeping them
> > close enough within reason is useful for knowledge transfer.
> >
> > SEC("iter/task") allows to:
> > a) iterate all threads in the system
> > b) iterate all threads for a given TGID
> > c) it also allows to "iterate" a single thread or process, but that's
> > a bit less relevant for in-kernel iterator, but we can still support
> > them, why not?
> >
> > I'm not sure if it supports iterating all processes (as in group
> > leaders of each task group) in the system, but if it's possible I
> > think we should support it at least for open-coded iterator, seems
> > like a very useful functionality.
> >
> > So to that end, let's design a small set of input arguments for
> > bpf_iter_process_new() that would allow to specify this as flags +
> > either (optional) struct task_struct * pointer to represent
> > task/process or PID/TGID.
> >
>
> Another concern from Alexei was the readability of the API of open-coded
> in BPF Program[1].
>
> bpf_for_each(task, curr) is straightforward. Users can easily understand
> that this API does the same thing as 'for_each_process' in kernel.

In general, users might have no idea about for_each_process macro in
the kernel, so I don't find this particular argument very convincing.

We can add a separate set of iterator kfuncs for every useful
combination of conditions, of course, but it's a double-edged sword.
Needing to use a different iterator just to specify a different
direction of cgroup iteration (from the example you referred in [1])
also means that it's now harder to write some generic function that
needs to do something for all cgroups matching some criteria where the
order might be coming as an argument.

Similarly for task iterators. It's not hard to imagine some processing
that can be equivalently done per thread or per process in the system,
or on each thread of the process, depending on some conditions or
external configuration. Having to do three different
bpf_for_each(task_xxx, task, ...) for this seems suboptimal. If the
nature of the thing that is iterated over is the same, and it's just a
different set of filters to specify which subset of those items should
be iterated, I think it's better to try to stick to the same iterator
with few simple arguments. IMO, of course, there is no objectively
best approach.

>
> However, if we keep the approach of SEC("iter/task")
>
> enum ITER_ITEM {
> ITER_TASK,
> ITER_THREAD,
> }
>
> __bpf_kfunc int bpf_iter_task_new(struct bpf_iter_process *it, struct
> task_struct *group_task, enum ITER_ITEM type)
>
> the API have to chang:
>
>
> bpf_for_each(task, curr, NULL, ITERATE_TASK) // iterate all process in
> the system
> bpf_for_each(task, curr, group_leader, ITERATE_THREAD) // iterate all
> thread of group_leader
> bpf_for_each(task, curr, NULL, ITERATE_THREAD) //iterate all threads of
> all the process in the system
>
> Useres may guess what are this API actually doing....

I'd expect users to consult documentation before trying to use an
unfamiliar cutting-edge functionality. So let's try to keep
documentation clear and up to the point. Extra flag argument doesn't
seem to be a big deal.


>
> So, I'm thinking if we can add a layer of abstraction to hide the
> details from the users:
>
> #define bpf_for_each_process(task) \
> bpf_for_each(task, curr, NULL, ITERATE_TASK)
>
>
> It would be nice if you could give me some better suggestions.

No, please no. This macro wrapper is useless and just obfuscates what
is going on. If users think it's helpful for them, they can trivially
define it for their own code.

>
> Thanks!
>
> [1]
> https://lore.kernel.org/lkml/CAADnVQLbDWUxFen-RS67C86sOE5DykEPD8xyihJ2RnG1WEnTQg@xxxxxxxxxxxxxx/