Re: [PATCH 10/13] sched_ext: Hook up hardlockup detector

From: Andrea Righi
Date: Mon Nov 10 2025 - 03:31:56 EST

Next message: Srinivas Kandagatla: "Re: [PATCH v1] nvmem: core: update cell->bytes after shifting bits"
Previous message: Philipp Stanner: "Re: [PATCH] drm/sched: Don't crash kernel on wrong params"
In reply to: Tejun Heo: "[PATCH 10/13] sched_ext: Hook up hardlockup detector"
Next in thread: Tejun Heo: "[PATCH 07/13] sched_ext: Make scx_exit() and scx_vexit() return bool"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Sun, Nov 09, 2025 at 08:31:09AM -1000, Tejun Heo wrote:
> A poorly behaving BPF scheduler can trigger hard lockup. For example, on a
> large system with many tasks pinned to different subsets of CPUs, if the BPF
> scheduler puts all tasks in a single DSQ and lets all CPUs at it, the DSQ lock
> can be contended to the point where hardlockup triggers. Unfortunately,
> hardlockup can be the first signal out of such situations, thus requiring
> hardlockup handling.
>
> Hook scx_hardlockup() into the hardlockup detector to try kicking out the
> current scheduler in an attempt to recover the system to a good state. The
> handling strategy can delay watchdog taking its own action by one polling
> period; however, given that the only remediation for hardlockup is crash, this
> is likely an acceptable trade-off.
>
> Reported-by: Dan Schatzberg <schatzberg.dan@xxxxxxxxx>
> Cc: Emil Tsalapatis <etsal@xxxxxxxx>
> Cc: Douglas Anderson <dianders@xxxxxxxxxxxx>
> Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>

Makes sense to me, from a sched_ext perspective:

Reviewed-by: Andrea Righi <arighi@xxxxxxxxxx>

Thanks,
-Andrea

> ---
> include/linux/sched/ext.h | 1 +
> kernel/sched/ext.c | 18 ++++++++++++++++++
> kernel/watchdog.c | 9 +++++++++
> 3 files changed, 28 insertions(+)
>
> diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
> index e1502faf6241..12561a3fcee4 100644
> --- a/include/linux/sched/ext.h
> +++ b/include/linux/sched/ext.h
> @@ -223,6 +223,7 @@ struct sched_ext_entity {
> void sched_ext_dead(struct task_struct *p);
> void print_scx_info(const char *log_lvl, struct task_struct *p);
> void scx_softlockup(u32 dur_s);
> +bool scx_hardlockup(void);
> bool scx_rcu_cpu_stall(void);
>
> #else /* !CONFIG_SCHED_CLASS_EXT */
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 4507bc4f0b5c..bd66178e5927 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -3710,6 +3710,24 @@ void scx_softlockup(u32 dur_s)
> smp_processor_id(), dur_s);
> }
>
> +/**
> + * scx_hardlockup - sched_ext hardlockup handler
> + *
> + * A poorly behaving BPF scheduler can trigger hard lockup by e.g. putting
> + * numerous affinitized tasks in a single queue and directing all CPUs at it.
> + * Try kicking out the current scheduler in an attempt to recover the system to
> + * a good state before taking more drastic actions.
> + */
> +bool scx_hardlockup(void)
> +{
> + if (!handle_lockup("hard lockup - CPU %d", smp_processor_id()))
> + return false;
> +
> + printk_deferred(KERN_ERR "sched_ext: Hard lockup - CPU %d, disabling BPF scheduler\n",
> + smp_processor_id());
> + return true;
> +}
> +
> /**
> * scx_bypass - [Un]bypass scx_ops and guarantee forward progress
> * @bypass: true for bypass, false for unbypass
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index 5b62d1002783..8dfac4a8f587 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -196,6 +196,15 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> #ifdef CONFIG_SYSFS
> ++hardlockup_count;
> #endif
> + /*
> + * A poorly behaving BPF scheduler can trigger hard lockup by
> + * e.g. putting numerous affinitized tasks in a single queue and
> + * directing all CPUs at it. The following call can return true
> + * only once when sched_ext is enabled and will immediately
> + * abort the BPF scheduler and print out a warning message.
> + */
> + if (scx_hardlockup())
> + return;
>
> /* Only print hardlockups once. */
> if (per_cpu(watchdog_hardlockup_warned, cpu))
> --
> 2.51.1
>

Next message: Srinivas Kandagatla: "Re: [PATCH v1] nvmem: core: update cell->bytes after shifting bits"
Previous message: Philipp Stanner: "Re: [PATCH] drm/sched: Don't crash kernel on wrong params"
In reply to: Tejun Heo: "[PATCH 10/13] sched_ext: Hook up hardlockup detector"
Next in thread: Tejun Heo: "[PATCH 07/13] sched_ext: Make scx_exit() and scx_vexit() return bool"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]