Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
From: Song Liu
Date: Fri Oct 04 2024 - 19:57:18 EST
On Fri, Oct 4, 2024 at 4:44 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
[...]
> > diff --git i/kernel/bpf/helpers.c w/kernel/bpf/helpers.c
> > index 3709fb142881..7311a26ecb01 100644
> > --- i/kernel/bpf/helpers.c
> > +++ w/kernel/bpf/helpers.c
> > @@ -3090,7 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > -BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL | KF_TRUSTED_ARGS
> > | KF_RCU_PROTECTED)
>
> I don't think KF_TRUSTED_ARGS approach would fit here.
> Namhyung's use case is tracing. The 'addr' will be some potentially
> arbitrary address from somewhere. The chance to see a trusted pointer
> is probably very low in such a tracing use case.
I thought the primary use case was to trace lock contention, for
example, queued_spin_lock_slowpath(). Of course, a more
general solution is better.
>
> The verifier change can mainly be the following:
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 7d9b38ffd220..e09eb108e956 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12834,6 +12834,9 @@ static int check_kfunc_call(struct
> bpf_verifier_env *env, struct bpf_insn *insn,
> regs[BPF_REG_0].type = PTR_TO_BTF_ID;
> regs[BPF_REG_0].btf_id = ptr_type_id;
>
> + if (meta.func_id ==
> special_kfunc_list[KF_get_kmem_cache])
> + regs[BPF_REG_0].type |= PTR_UNTRUSTED;
> +
> if (is_iter_next_kfunc(&meta)) {
> struct bpf_reg_state *cur_iter;
This is easier than I thought.
Thanks,
Song
> The returned 'struct kmem_cache *' won't be refcnt-ed (acquired).
> It will be readonly via ptr_to_btf_id logic.
> s->flags;
> s->size;
> s->offset;
> access will be allowed but the verifier will sanitize them
> with an inlined version of probe_read_kernel.
> Even KF_RET_NULL can be dropped.