Re: [PATCH 07/10] sched_ext: Add verifier-time kfunc context filter

From: Cheng-Yang Chou

Date: Tue Apr 14 2026 - 08:42:27 EST

Hi Tejun, Andrea,

On Thu, Apr 09, 2026 at 08:30:43PM -1000, Tejun Heo wrote:
> +/*
> + * Verifier-time filter for context-sensitive SCX kfuncs. Registered via the
> + * .filter field on each per-group btf_kfunc_id_set. The BPF core invokes this
> + * for every kfunc call in the registered hook (BPF_PROG_TYPE_STRUCT_OPS or
> + * BPF_PROG_TYPE_SYSCALL), regardless of which set originally introduced the
> + * kfunc - so the filter must short-circuit on kfuncs it doesn't govern (e.g.
> + * scx_kfunc_ids_any) by falling through to "allow" when none of the
> + * context-sensitive sets contain the kfunc.
> + */
> +int scx_kfunc_context_filter(const struct bpf_prog *prog, u32 kfunc_id)
> +{
> + bool in_unlocked = btf_id_set8_contains(&scx_kfunc_ids_unlocked, kfunc_id);
> + bool in_select_cpu = btf_id_set8_contains(&scx_kfunc_ids_select_cpu, kfunc_id);
> + bool in_enqueue = btf_id_set8_contains(&scx_kfunc_ids_enqueue_dispatch, kfunc_id);
> + bool in_dispatch = btf_id_set8_contains(&scx_kfunc_ids_dispatch, kfunc_id);
> + bool in_cpu_release = btf_id_set8_contains(&scx_kfunc_ids_cpu_release, kfunc_id);
> + u32 moff, flags;
> +
> + /* Not a context-sensitive kfunc (e.g. from scx_kfunc_ids_any) - allow. */
> + if (!(in_unlocked || in_select_cpu || in_enqueue || in_dispatch || in_cpu_release))
> + return 0;
> +
> + /* SYSCALL progs (e.g. BPF test_run()) may call unlocked and select_cpu kfuncs. */
> + if (prog->type == BPF_PROG_TYPE_SYSCALL)
> + return (in_unlocked || in_select_cpu) ? 0 : -EACCES;
> +
> + if (prog->type != BPF_PROG_TYPE_STRUCT_OPS)
> + return -EACCES;
> +
> + /*
> + * add_subprog_and_kfunc() collects all kfunc calls, including dead code
> + * guarded by bpf_ksym_exists(), before check_attach_btf_id() sets
> + * prog->aux->st_ops. Allow all kfuncs when st_ops is not yet set;
> + * do_check_main() re-runs the filter with st_ops set and enforces the
> + * actual restrictions.
> + */
> + if (!prog->aux->st_ops)
> + return 0;
> +
> + /*
> + * Non-SCX struct_ops: only unlocked kfuncs are safe. The other
> + * context-sensitive kfuncs assume the rq lock is held by the SCX
> + * dispatch path, which doesn't apply to other struct_ops users.
> + */
> + if (prog->aux->st_ops != &bpf_sched_ext_ops)
> + return in_unlocked ? 0 : -EACCES;

After reconsidering the concern raised by Sashiko [1], I think we should
not allow non-SCX struct_ops programs to call SCX unlocked kfuncs.

I found two concrete problems with the current approach:

1. KASAN slab-out-of-bounds in scx_prog_sched()

With CONFIG_EXT_SUB_SCHED=y, scx_prog_sched() blindly casts the
pointer returned by bpf_prog_get_assoc_struct_ops(aux) to
struct sched_ext_ops * and reads ops->priv.

For a non-SCX struct_ops program we could hit a KASAN bug.
Verified (See Full log [2]):

[ 46.496052] BUG: KASAN: slab-out-of-bounds in scx_bpf_kick_cpu+0x29c/0x2b0
[ 46.496175] Read of size 8 at addr ffff88811167bd10 by task scx_oob_kasan/633
...
[ 46.496478] ? scx_bpf_kick_cpu+0x29c/0x2b0
[ 46.496488] scx_bpf_kick_cpu+0x29c/0x2b0
[ 46.496494] bpf_prog_746ba9ec0529bae2_test_ca_init+0x27/0x29
[ 46.496499] bpf__tcp_congestion_ops_init+0x47/0xa3
[ 46.496506] tcp_init_congestion_control+0xad/0x430
[ 46.496512] tcp_init_transfer+0x537/0x8f0
[ 46.496519] tcp_finish_connect+0x1ef/0x700

2. Non-SCX programs should not call SCX kfuncs

Allowing non-SCX struct_ops programs to call unlocked kfuncs such as
scx_bpf_kick_cpu() is semantically wrong.

So I think we should do the following:

1. Add a filter to deny all context-sensitive kfuncs for non-SCX
struct_ops programs, not just the rq-locked ones.
This makes the runtime OOB in scx_prog_sched() unreachable.

2. Add a selftest: a TCP BPF program that calls scx_bpf_kick_cpu()
should be rejected at load time.

Wdyt, thanks.

[1] https://sashiko.dev/#/patchset/20260410063046.3556100-1-tj%40kernel.org?part=7
[2] https://gist.github.com/EricccTaiwan/d1787f4f1fa5b5d42d436fbe7c0e2b3b

--
Thanks,
Cheng-Yang