Re: [PATCH] sched_ext: Add scx_ai_numa scheduler example for AI workloads

Next message: Muhammad Amirul Asyraf Mohamad Jamian: "[PATCH v2] firmware: stratix10-svc: Fix probe failure on old ATF with sync-only fallback"
Previous message: Catalin Marinas: "Re: [PATCH v3 2/9] dma-direct: use DMA_ATTR_CC_SHARED in alloc/free paths"
In reply to: Andrea Righi: "Re: [PATCH] sched_ext: Add scx_ai_numa scheduler example for AI workloads"
Next in thread: Andrea Righi: "Re: [PATCH] sched_ext: Add scx_ai_numa scheduler example for AI workloads"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Christian Loehle

Date: Fri May 08 2026 - 05:29:41 EST

On 5/8/26 08:56, Andrea Righi wrote:
> Hi Qiliang,
>
> On Fri, May 08, 2026 at 03:51:35PM +0800, Qiliang Yuan wrote:
>> Implement an AI-focused NUMA-aware scheduler that optimizes task dispatch for
>> GPU-accelerated AI training. The scheduler maintains per-NUMA-node dispatch
>> queues to preserve L3 cache warmth and minimize remote DRAM accesses that
>> would stall GPU kernel launches waiting on CPU preprocessing.
>>
>> Key features:
>> - Per-NUMA-node DSQs (dispatch queues) to maintain cache locality
>> - Idle fast path that bypasses DSQ for minimum latency
>> - Per-task NUMA affinity tracking to remember task placement
>> - Work stealing across nodes to prevent starvation during load imbalance
>>
>> The BPF component (scx_ai_numa.bpf.c) implements the core scheduler
>> callbacks, while the userspace loader (scx_ai_numa.c) detects NUMA
>> topology, installs the BPF program, and reports per-node dispatch
>> statistics every second.
>>
>> This scheduler is suitable for AI training workloads where GPU command
>> launches depend on rapid CPU preprocessing with minimal scheduling latency.
>>
>> Signed-off-by: Qiliang Yuan <realwujing@xxxxxxxxx>
>
> I think this would be more appropriate for inclusion in
> https://github.com/sched-ext/scx.

That repo no longer hosts C schedulers though, no?
I guess it's trivial to convert this particular one to rust.