Re: [PATCH bpf-next v2 4/6] bpf: Introduce css_descendant open-coded iterator kfuncs

From: Chuyi Zhou
Date: Fri Sep 15 2023 - 07:58:20 EST


Hello.

在 2023/9/15 07:26, Andrii Nakryiko 写道:
On Tue, Sep 12, 2023 at 12:02 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:

This Patch adds kfuncs bpf_iter_css_{pre,post}_{new,next,destroy} which
allow creation and manipulation of struct bpf_iter_css in open-coded
iterator style. These kfuncs actually wrapps css_next_descendant_{pre,
post}. BPF programs can use these kfuncs through bpf_for_each macro for
iteration of all descendant css under a root css.

Signed-off-by: Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx>
---
include/uapi/linux/bpf.h | 8 +++++
kernel/bpf/helpers.c | 6 ++++
kernel/bpf/task_iter.c | 53 ++++++++++++++++++++++++++++++++++
tools/include/uapi/linux/bpf.h | 8 +++++
tools/lib/bpf/bpf_helpers.h | 12 ++++++++
5 files changed, 87 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index befa55b52e29..57760afc13d0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -7326,4 +7326,12 @@ struct bpf_iter_process {
__u64 __opaque[1];
} __attribute__((aligned(8)));

+struct bpf_iter_css_pre {
+ __u64 __opaque[2];
+} __attribute__((aligned(8)));
+
+struct bpf_iter_css_post {
+ __u64 __opaque[2];
+} __attribute__((aligned(8)));
+
#endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 9b7d2c6f99d1..ca1f6404af9e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -2510,6 +2510,12 @@ BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_css_pre_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_css_pre_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_css_pre_destroy, KF_ITER_DESTROY)
+BTF_ID_FLAGS(func, bpf_iter_css_post_new, KF_ITER_NEW)
+BTF_ID_FLAGS(func, bpf_iter_css_post_next, KF_ITER_NEXT | KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_iter_css_post_destroy, KF_ITER_DESTROY)
BTF_ID_FLAGS(func, bpf_dynptr_adjust)
BTF_ID_FLAGS(func, bpf_dynptr_is_null)
BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
index 9d1927dc3a06..8963fc779b87 100644
--- a/kernel/bpf/task_iter.c
+++ b/kernel/bpf/task_iter.c
@@ -880,6 +880,59 @@ __bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
{
}

+struct bpf_iter_css_kern {
+ struct cgroup_subsys_state *root;
+ struct cgroup_subsys_state *pos;
+} __attribute__((aligned(8)));
+
+__bpf_kfunc int bpf_iter_css_pre_new(struct bpf_iter_css_pre *it,
+ struct cgroup_subsys_state *root)

similar to my comment on previous patches, please see
kernel/bpf/cgroup_iter.c for iter/cgroup iterator program. Let's stay
consistent. We have one iterator that accepts parameters defining
iteration order and starting cgroup. Unless there are some technical
reasons we can't follow similar approach with this open-coded iter,
let's use the same approach. We can even reuse
BPF_CGROUP_ITER_DESCENDANTS_PRE, BPF_CGROUP_ITER_DESCENDANTS_POST,
BPF_CGROUP_ITER_ANCESTORS_UP enums.


I know your concern. It would be nice if we keep consistent with kernel/bpf/cgroup_iter.c

But this patch actually want to support iterating css (cgroup_subsys_state) not cgroup (css is more low lever).
With css_iter we can do something like "for_each_mem_cgroup_tree/cpuset_for_each_descendant_pre"
in BPF Progs which is hard for cgroup_iter. In the future we can use this iterator to plug some customizable policy in other resource control system.

BTW, what I did in RFC actually very similar with the approach of cgroup_iter. (https://lore.kernel.org/all/20230827072057.1591929-4-zhouchuyi@xxxxxxxxxxxxx/).

Thanks.