[PATCHSET sched_ext/for-7.1-fixes] sched_ext: Fix cgroup iter coverage of in-do_exit tasks

From: Tejun Heo

Date: Mon Apr 27 2026 - 20:16:44 EST


Hello,

a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") made
css_task_iter_advance() skip exiting tasks. That broke scx_task_iter's
cgroup-scoped mode: it now silently skips tasks that are still on
scx_tasks but past exit_signals(), so the abort path in
scx_sub_enable_workfn() can miss SCX_TASK_SUB_INIT-marked exiting tasks
and leak __scx_init_task() state.

Restoring iter coverage exposes a separate latent issue: cgroup
iteration can return tasks whose sched_ext_dead() has already torn down
their per-task SCX state (cgroup_task_dead() runs after sched_ext_dead()
in finish_task_switch() and is irq-work deferred on PREEMPT_RT). Callers
trip WARN_ON_ONCE() / fail assertions when they see such a task.

This pair fixes both:

0001 sched_ext: Include exiting tasks in cgroup iter
Adds CSS_TASK_ITER_WITH_DEAD; scx_task_iter opts in.

0002 sched_ext: Skip past-sched_ext_dead() tasks in
scx_task_iter_next_locked()
Adds SCX_TASK_OFF_TASKS, set in sched_ext_dead() under the rq
lock; scx_task_iter_next_locked() skips flagged tasks under the
same lock.

Verified with a stress harness that runs a 4-deep nested sub-sched
hierarchy with continuous fork/switch workers and random sub-sched
restarts at 5s intervals. Baseline (without the patches) wedged a
192-CPU bare-metal box in 66s and oopsed a 24-thread bare-metal box at
227s. Patched ran clean for 30min on both plus an 8-vCPU vng - 0
WARN/BUG/lockdep across ~1000 sub-restarts.

Based on sched_ext/for-7.1-fixes (deb7b2f93d01).

include/linux/cgroup.h | 1 +
include/linux/sched/ext.h | 1 +
kernel/cgroup/cgroup.c | 8 +++++---
kernel/sched/ext.c | 39 +++++++++++++++++++++++++++++----------
4 files changed, 36 insertions(+), 13 deletions(-)

Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git iter-include-dead-v1

Thanks.

--
tejun