Re: [PATCHSET sched_ext/for-6.12-fixes] sched_ext: Split %SCX_DSQ_GLOBAL per-node

From: Tejun Heo
Date: Thu Sep 26 2024 - 19:00:48 EST


On Tue, Sep 24, 2024 at 02:06:02PM -1000, Tejun Heo wrote:
> In the bypass mode, the global DSQ is used to schedule all tasks in simple
> FIFO order. All tasks are queued into the global DSQ and all CPUs try to
> execute tasks from it. This creates a lot of cross-node cacheline accesses
> and scheduling across the node boundaries, and can lead to live-lock
> conditions where the system takes tens of minutes to disable the BPF
> scheduler while executing in the bypass mode.
>
> This patchset splits the global DSQ per NUMA node. Each node has its own
> global DSQ. When a task is dispatched to SCX_DSQ_GLOBAL, it's put into the
> global DSQ local to the task's CPU and all CPUs in a node only consume its
> node-local global DSQ.
>
> This resolves a livelock condition which could be reliably triggered on an
> 2x EPYC 7642 system by running `stress-ng --race-sched 1024` together with
> `stress-ng --workload 80 --workload-threads 10` while repeatedly enabling
> and disabling a SCX scheduler.
>
> This patchset contains the following patches:
>
> 0001-scx_flatcg-Use-a-user-DSQ-for-fallback-instead-of-SC.patch
> 0002-sched_ext-Allow-only-user-DSQs-for-scx_bpf_consume-s.patch
> 0003-sched_ext-Relocate-find_user_dsq.patch
> 0004-sched_ext-Split-the-global-DSQ-per-NUMA-node.patch
> 0005-sched_ext-Use-shorter-slice-while-bypassing.patch

Applied to sched_ext/for-6.12-fixes.

Thanks.

--
tejun