Re: [REGRESSION] sched/fair: Reimplement NEXT_BUDDY to align with EEVDF goals

From: K Prateek Nayak

Date: Tue Jan 13 2026 - 01:31:27 EST


Hello Ryan,

On 1/12/2026 2:22 PM, Ryan Roberts wrote:
> On 12/01/2026 07:47, Peter Zijlstra wrote:
>> On Fri, Jan 09, 2026 at 10:15:46AM +0000, Ryan Roberts wrote:
>>
>>> Here are the updated results, now including column for "revert #1 & #2".
>>>
>>> 6-18-0 (base) (baseline)
>>> 6-19-0-rc1 (New NEXT_BUDDY implementation enabled)
>>> revert #1 & #2 (NEXT_BUDDY disabled)
>>> revert #2 (Old NEXT_BUDDY implementation enabled)
>>>
>>>
>>> The regressions that are fixed by "revert #2" (as originally reported) are still
>>> fixed in "revert #1 & #2". Interestingly, performance actually improves further
>>> for the latter in the multi-node mysql benchmark (which is our VIP workload).
>>> There are a couple of hackbench cases (sockets with high thread counts) that
>>> showed an improvement with "revert #2" but which is gone with "revert #1 & #2".
>>>
>>> Let me know if I can usefully do anything else.
>>
>> If its not too much bother, could you run 6.19-rc with SCHED_BATCH ? The
>> defining characteristic of BATCH is that it fully ignores wakeup
>> preemption.
>
> Is there a way I can force all future tasks to use SCHED_BATCH at the system
> level?

One shortcut is to echo "NO_WAKEUP_PREEMPTION" into
/sys/kernel/debug/sched/features but note it'll disable wakeup preemption
for all tasks, including kthreads, which might adversely affect
performance and is not an exact equivalent to only running the workload
under SCHED_BATCH.

For repro-collection/mysql-workload, (which I presume is [1]), there is
a "WORKLOAD_SCHED_POLICY" environment variable that can be overridden [2]
which controls the "CPUSchedulingPolicy" of the mysqld service.

[1] https://github.com/aws/repro-collection/tree/main/workloads/mysql
[2] https://github.com/aws/repro-collection/blob/a2cdf0455bd3422c9c1fc689ceac32971223b984/repros/repro-mysql-EEVDF-regression/main.sh#L102

--
Thanks and Regards,
Prateek