Re: per st_ops kfunc allow/deny mask. Was: [PATCH bpf-next v6 4/5] bpf: Make fs kfuncs available for SYSCALL program type
From: Song Liu
Date: Thu Jan 09 2025 - 15:50:23 EST
On Thu, Jan 9, 2025 at 11:24 AM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Dec 23, 2024 at 4:51 PM Juntong Deng <juntong.deng@xxxxxxxxxxx> wrote:
> >
> > >
> > > The main goal is to get rid of run-time mask check in SCX_CALL_OP() and
> > > make it static by the verifier. To make that happen scx_kf_mask flags
> > > would need to become KF_* flags while each struct-ops callback will
> > > specify the expected mask.
> > > Then at struct-ops prog attach time the verifier will see the expected mask
> > > and can check that all kfuncs calls of this particular program
> > > satisfy the mask. Then all of the runtime overhead of
> > > current->scx.kf_mask and scx_kf_allowed() will go away.
> >
> > Thanks for pointing this out.
> >
> > Yes, I am interested in working on it.
> >
> > I will try to solve this problem in a separate patch series.
> >
> >
> > The following are my thoughts:
> >
> > Should we really use KF_* to do this? I think KF_* is currently more
> > like declaring that a kfunc has some kind of attribute, e.g.
> > KF_TRUSTED_ARGS means that the kfunc only accepts trusted arguments,
> > rather than being used to categorise kfuncs.
> >
> > It is not sustainable to restrict the kfuncs that can be used based on
> > program types, which are coarse-grained. This problem will get worse
> > as kfuncs increase.
> >
> > In my opinion, managing the kfuncs available to bpf programs should be
> > implemented as capabilities. Capabilities are a mature permission model.
> > We can treat a set of kfuncs as a capability (like the various current
> > kfunc_sets, but the current kfunc_sets did not carefully divide
> > permissions).
> >
> > We should use separate BPF_CAP_XXX flags to manage these capabilities.
> > For example, SCX may define BPF_CAP_SCX_DISPATCH.
> >
> > For program types, we should divide them into two levels, types and
> > subtypes. Types are used to register common capabilities and subtypes
> > are used to register specific capabilities. The verifier can check if
> > the used kfuncs are allowed based on the type and subtype of the bpf
> > program.
> >
> > I understand that we need to maintain backward compatibility to
> > userspace, but capabilities are internal changes in the kernel.
> > Perhaps we can make the current program types as subtypes and
> > add 'types' that are only used internally, and more subtypes
> > (program types) can be added in the future.
>
> Sorry for the delay.
> imo CAP* approach doesn't fit.
> caps are security bits exposed to user space.
> Here there is no need to expose anything to user space.
>
> But you're also correct that we cannot extend kfunc KF_* flags
> that easily. KF_* flags are limited to 32-bit and we're already
> using 12 bits.
> enum scx_kf_mask needs 5 bits, so we can squeeze them into
> the current 32-bit field _for now_,
> but eventually we'd need to refactor kfunc definition into a wider set:
> BTF_ID_FLAGS(func, .. KF_*)
> so that different struct_ops consumers can define their own bits.
>
> Right now SCX is the only st_ops consumer who needs this feature,
> so let's squeeze into the existing KF facility.
>
> First step is to remap scx_kf_mask bits into unused bits in KF_
> and annotate corresponding sched-ext kfuncs with it.
> For example:
> SCX_KF_DISPATCH will become
> KF_DISPATCH (1 << 13)
>
> and all kfuncs that are allowed to be called from ->dispatch() callback
> will be annotated like:
> - BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
> - BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots)
> - BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel)
> + BTF_KFUNCS_START(scx_kfunc_ids_dispatch)
> + BTF_ID_FLAGS(func, scx_bpf_dispatch_nr_slots, KF_DISPATCH)
> + BTF_ID_FLAGS(func, scx_bpf_dispatch_cancel, KF_DISPATCH)
>
>
> For sched_ext_ops callback annotations, I think,
> the simplest approach is to add special
> BTF_SET8_START(st_ops_flags)
> BTF_ID_FLAGS(func, sched_ext_ops__dispatch, KF_DISPATCH)
> and so on for other ops stubs.
>
> sched_ext_ops__dispatch() is an empty function that
> exists in the vmlinux, and though it's not a kfunc
> we can use it to annotate
> (struct sched_ext_ops *)->dispatch() callback
> with a particular KF_ flag
> (or a set of flags for SCX_KF_RQ_LOCKED case).
>
> Then the verifier (while analyzing the program that is targeted
> to be attach to this ->dispatch() hook)
> will check this extra KF flag in st_ops
> and will only allow to call kfuncs with matching flags:
>
> if (st_ops->kf_mask & kfunc->kf_mask) // ok to call kfunc from this callback
>
> The end result current->scx.kf_mask will be removed
> and instead of run-time check it will become static verifier check.
Shall we move some of these logics from verifier core to
btf_kfunc_id_set.filter()? IIUC, this would avoid using extra
KF_* bits. To make the filter functions more capable, we
probably need to pass bpf_verifier_env into the filter() function.
Does this make sense?
Thanks,
Song