Re: [RFC PATCH bpf-next 2/4] bpf: Introduce process open coded iterator kfuncs

From: Alexei Starovoitov
Date: Wed Sep 06 2023 - 13:18:20 EST


On Wed, Sep 6, 2023 at 5:38 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
>
> Hello, Alexei.
>
> 在 2023/9/6 04:09, Alexei Starovoitov 写道:
> > On Sun, Aug 27, 2023 at 12:21 AM Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx> wrote:
> >>
> >> This patch adds kfuncs bpf_iter_process_{new,next,destroy} which allow
> >> creation and manipulation of struct bpf_iter_process in open-coded iterator
> >> style. BPF programs can use these kfuncs or through bpf_for_each macro to
> >> iterate all processes in the system.
> >>
> >> Signed-off-by: Chuyi Zhou <zhouchuyi@xxxxxxxxxxxxx>
> >> ---
> >> include/uapi/linux/bpf.h | 4 ++++
> >> kernel/bpf/helpers.c | 3 +++
> >> kernel/bpf/task_iter.c | 31 +++++++++++++++++++++++++++++++
> >> tools/include/uapi/linux/bpf.h | 4 ++++
> >> tools/lib/bpf/bpf_helpers.h | 5 +++++
> >> 5 files changed, 47 insertions(+)
> >>
> >> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >> index 2a6e9b99564b..cfbd527e3733 100644
> >> --- a/include/uapi/linux/bpf.h
> >> +++ b/include/uapi/linux/bpf.h
> >> @@ -7199,4 +7199,8 @@ struct bpf_iter_css_task {
> >> __u64 __opaque[1];
> >> } __attribute__((aligned(8)));
> >>
> >> +struct bpf_iter_process {
> >> + __u64 __opaque[1];
> >> +} __attribute__((aligned(8)));
> >> +
> >> #endif /* _UAPI__LINUX_BPF_H__ */
> >> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >> index cf113ad24837..81a2005edc26 100644
> >> --- a/kernel/bpf/helpers.c
> >> +++ b/kernel/bpf/helpers.c
> >> @@ -2458,6 +2458,9 @@ BTF_ID_FLAGS(func, bpf_iter_num_destroy, KF_ITER_DESTROY)
> >> BTF_ID_FLAGS(func, bpf_iter_css_task_new, KF_ITER_NEW)
> >> BTF_ID_FLAGS(func, bpf_iter_css_task_next, KF_ITER_NEXT | KF_RET_NULL)
> >> BTF_ID_FLAGS(func, bpf_iter_css_task_destroy, KF_ITER_DESTROY)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_new, KF_ITER_NEW)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_next, KF_ITER_NEXT | KF_RET_NULL)
> >> +BTF_ID_FLAGS(func, bpf_iter_process_destroy, KF_ITER_DESTROY)
> >> BTF_ID_FLAGS(func, bpf_dynptr_adjust)
> >> BTF_ID_FLAGS(func, bpf_dynptr_is_null)
> >> BTF_ID_FLAGS(func, bpf_dynptr_is_rdonly)
> >> diff --git a/kernel/bpf/task_iter.c b/kernel/bpf/task_iter.c
> >> index b1bdba40b684..a6717a76c1e0 100644
> >> --- a/kernel/bpf/task_iter.c
> >> +++ b/kernel/bpf/task_iter.c
> >> @@ -862,6 +862,37 @@ __bpf_kfunc void bpf_iter_css_task_destroy(struct bpf_iter_css_task *it)
> >> kfree(kit->css_it);
> >> }
> >>
> >> +struct bpf_iter_process_kern {
> >> + struct task_struct *tsk;
> >> +} __attribute__((aligned(8)));
> >> +
> >> +__bpf_kfunc int bpf_iter_process_new(struct bpf_iter_process *it)
> >> +{
> >> + struct bpf_iter_process_kern *kit = (void *)it;
> >> +
> >> + BUILD_BUG_ON(sizeof(struct bpf_iter_process_kern) != sizeof(struct bpf_iter_process));
> >> + BUILD_BUG_ON(__alignof__(struct bpf_iter_process_kern) !=
> >> + __alignof__(struct bpf_iter_process));
> >> +
> >> + rcu_read_lock();
> >> + kit->tsk = &init_task;
> >> + return 0;
> >> +}
> >> +
> >> +__bpf_kfunc struct task_struct *bpf_iter_process_next(struct bpf_iter_process *it)
> >> +{
> >> + struct bpf_iter_process_kern *kit = (void *)it;
> >> +
> >> + kit->tsk = next_task(kit->tsk);
> >> +
> >> + return kit->tsk == &init_task ? NULL : kit->tsk;
> >> +}
> >> +
> >> +__bpf_kfunc void bpf_iter_process_destroy(struct bpf_iter_process *it)
> >> +{
> >> + rcu_read_unlock();
> >> +}
> >
> > This iter can be used in all ctx-s which is nice, but let's
> > make the verifier enforce rcu_read_lock/unlock done by bpf prog
> > instead of doing in the ctor/dtor of iter, since
> > in sleepable progs the verifier won't recognize that body is RCU CS.
> > We'd need to teach the verifier to allow bpf_iter_process_new()
> > inside in_rcu_cs() and make sure there is no rcu_read_unlock
> > while BPF_ITER_STATE_ACTIVE.
> > bpf_iter_process_destroy() would become a nop.
>
> Thanks for your review!
>
> I think bpf_iter_process_{new, next, destroy} should be protected by
> bpf_rcu_read_lock/unlock explicitly whether the prog is sleepable or
> not, right?

Correct. By explicit bpf_rcu_read_lock() in case of sleepable progs
or just by using them in normal bpf progs that have implicit rcu_read_lock()
done before calling into them.

> I'm not very familiar with the BPF verifier, but I believe
> there is still a risk in directly calling these kfuns even if
> in_rcu_cs() is true.
>
> Maby what we actually need here is to enforce BPF verifier to check
> env->cur_state->active_rcu_lock is true when we want to call these kfuncs.

active_rcu_lock means explicit bpf_rcu_read_lock.
Currently we do allow bpf_rcu_read_lock in non-sleepable, but it's pointless.

Technically we can extend the check:
if (in_rbtree_lock_required_cb(env) && (rcu_lock ||
rcu_unlock)) {
verbose(env, "Calling
bpf_rcu_read_{lock,unlock} in unnecessary rbtree callback\n");
return -EACCES;
}
to discourage their use in all non-sleepable, but it will break some progs.

I think it's ok to check in_rcu_cs() to allow bpf_iter_process_*().
If bpf prog adds explicit and unnecessary bpf_rcu_read_lock() around
the iter ops it won't do any harm.
Just need to make sure that rcu unlock logic:
} else if (rcu_unlock) {
bpf_for_each_reg_in_vstate(env->cur_state,
state, reg, ({
if (reg->type & MEM_RCU) {
reg->type &= ~(MEM_RCU |
PTR_MAYBE_NULL);
reg->type |= PTR_UNTRUSTED;
}
}));
clears iter state that depends on rcu.

I thought about changing mark_stack_slots_iter() to do
st->type = PTR_TO_STACK | MEM_RCU;
so that the above clearing logic kicks in,
but it might be better to have something iter specific.
is_iter_reg_valid_init() should probably be changed to
make sure reg->type is not UNTRUSTED.

Andrii,
do you have better suggestions?