[PATCHSET sched_ext/for-7.1] sched_ext: Add verifier-time kfunc context filter

From: Tejun Heo

Date: Fri Apr 10 2026 - 02:33:05 EST


Hello,

This moves enforcement of SCX context-sensitive kfunc restrictions from
runtime kf_mask checks to BPF verifier-time filtering, using the BPF core's
struct_ops context information.

This is based on work by Juntong Deng and Cheng-Yang Chou:

https://lore.kernel.org/r/20260406154834.1920962-1-yphbchou0911@xxxxxxxxx

I ended up redoing the series. The number of changes needed and the
difficulty of validating each one made iterating through review emails
impractical:

- Pre-existing call-site bugs needed fixing first. ops.cgroup_move() was
mislabeled as SCX_KF_UNLOCKED when sched_move_task() actually holds the
rq lock, and set_cpus_allowed_scx() passed rq=NULL to SCX_CALL_OP_TASK
despite holding the rq lock. These had to be sorted out before the
runtime-to-verifier conversion could be validated.

- The macro-based kfunc ID deduplication (SCX_KFUNCS_*) made it hard to
verify that the new code produced the same accept/reject verdicts as
the old.

- No systematic validation of the full (kfunc, caller) verdict matrix
existed, so it wasn't clear whether the conversion was correct.

This series takes a different approach: first fix the call-site bugs that
made the conversion harder than it needed to be, then do the conversion in
small isolated steps, and verify the full verdict matrix at each stage.

The series:

01/10 Drop TRACING access to select_cpu kfuncs
02/10 Add select_cpu kfuncs to scx_kfunc_ids_unlocked
03/10 Track @p's rq lock across set_cpus_allowed_scx -> ops.set_cpumask
04/10 Fix ops.cgroup_move() invocation kf_mask and rq tracking
05/10 Decouple kfunc unlocked-context check from kf_mask
06/10 Drop redundant rq-locked check from scx_bpf_task_cgroup()
07/10 Add verifier-time kfunc context filter
08/10 Remove runtime kfunc mask enforcement
09/10 Rename scx_kf_allowed_on_arg_tasks() to scx_kf_arg_task_ok()
10/10 Warn on task-based SCX op recursion

Patches 1-2 are extracted from the original patchset. Patches 3-4 fix
pre-existing call-site bugs where SCX_CALL_OP_TASK passed rq=NULL despite
the kernel holding the rq lock. Patch 5 converts select_cpu_from_kfunc and
scx_dsq_move to explicit locked-state tests. Patch 6 drops the now-
redundant kf_mask check from scx_kf_allowed_on_arg_tasks. Patch 7 adds the
verifier-time filter. Patch 8 removes the runtime kf_mask machinery. Patches
9-10 are post-removal cleanup.

The full verdict matrix was verified by writing BPF test programs covering
every kfunc group from every relevant caller context, testing both baseline
and patched kernels. All in-tree example schedulers and most scx-repo
schedulers pass smoke testing on the patched kernel.

Based on sched_ext/for-7.1 (ff1befcb1683).

include/linux/sched/ext.h | 28 ---
kernel/sched/ext.c | 415 ++++++++++++++++++++---------------------
kernel/sched/ext_idle.c | 69 ++++---
kernel/sched/ext_idle.h | 2 +
kernel/sched/ext_internal.h | 8 +-
kernel/sched/sched.h | 1 +
6 files changed, 253 insertions(+), 270 deletions(-)

Git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git scx-kf-allowed-filter

--
tejun