Re: sched/deadline: Use revised wakeup rule for dl_server

From: Christian Loehle

Date: Mon May 25 2026 - 03:27:02 EST

On 5/11/26 10:47, Christian Loehle wrote:
> On 5/9/26 12:42, Andreas Ziegler wrote:
>> Hi Christian, Everyone,
>>
>> On 2026-05-08 14:13, Christian Loehle wrote:
>>> On 5/8/26 13:06, Andreas Ziegler wrote:
>>>> Hi Christian,
>>>>
>>>> On 2026-05-08 09:20, Christian Loehle wrote:
>>>>> On 5/8/26 09:09, Andreas Ziegler wrote:
>>>>>> Linux kernel version: 6.12
>>>>>> CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
>>>>>> Architecture: aarch64
>>>>>> Platform: Raspberry Pi 4
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Commit d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) [1] introduced a marked degradation in scheduling latency for real-time tasks in the presence of heavy I/O load.
>>>>>>
>>>>>> --- a/kernel/sched/deadline.c
>>>>>> +++ b/kernel/sched/deadline.c
>>>>>> @@ -1079,7 +1079,7 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
>>>>>>      if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>>>>>>          dl_entity_overflow(dl_se, rq_clock(rq))) {
>>>>>>
>>>>>> -        if (unlikely(!dl_is_implicit(dl_se) &&
>>>>>> +        if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) &&
>>>>>>                   !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
>>>>>>                   !is_dl_boosted(dl_se))) {
>>>>>>              update_dl_revised_wakeup(dl_se, rq);
>>>>>>
>>>>>> This was observed using a modified version of Con Kolivas' interactivity benchmark [2]; kernel bisection eventually pointed to the above mentioned commit.
>>>>>>
>>>>>> Benchmark results before d66792919d4f:
>>>>>>
>>>>>> --- Benchmarking simulated cpu of Audio real time in the presence of simulated ---
>>>>>> Load    Latency +/- SD   median max [100n]    Desired CPU Deadlines met [%]
>>>>>> None      76.6 +/- 8.3654    76 166
>>>>>> Video      78.5 +/- 3.9433    78 107
>>>>>> X      76.4 +/- 8.123     75 157
>>>>>> Burn      72.0 +/- 6.4733    71 127
>>>>>> Write     255.3 +/- 26.627   252 331
>>>>>> Read     226.6 +/- 12.38    227 262
>>>>>> Ring      84.2 +/- 6.6207    83 125
>>>>>> Compile     225.3 +/- 23.949   222 328
>>>>>>
>>>>>>      136.8 +/- 78.462        331
>>>>>>
>>>>>> Benchmark results after d66792919d4f:
>>>>>>
>>>>>> --- Benchmarking simulated cpu of Audio real time in the presence of simulated ---
>>>>>> Load    Latency +/- SD   median max [100n]    Desired CPU Deadlines met [%]
>>>>>> None      68.4 +/- 9.7864    67 169
>>>>>> Video      74.4 +/- 3.724     74   97
>>>>>> X      72.0 +/- 6.5681    71 129
>>>>>> Burn      66.9 +/- 5.9059    66 117
>>>>>> Write    9576.9 +/- 67639    250500418        98.1         98.1
>>>>>> Read     209.3 +/- 11.018   209 267
>>>>>> Ring      80.5 +/- 8.0993    78 125
>>>>>> Compile     239.0 +/- 29.447   234 372
>>>>>>
>>>>>>     1298.4 +/- 24118       500418
>>>>>>
>>>>>> Reverting this commit obviously solves the issue for me. I have no idea why this issue appears exclusively with heavy write loads in the background.
>>>>>>
>>>>>> Is this a scheduler issue, or rather something in the background?
>>>>>>
>>>>>
>>>>> Hi Andreas,
>>>>> You're using cpufreq schedutil for your tests I'm assuming?
>>>>> Is there a difference in cpufreq behavior (avg cpufreq or OPP residencies?)
>>>>> Does the regression also happen on powersave/performance governor?
>>>>
>>>> Actually this is a very stripped-down system. The 'performance' cpufreq governor is the only one compiled in, the processor cores run on a fixed frequency. CONFIG_PM_OPP is not set.
>>>
>>> That certainly makes the analysis easier.
>>> I couldn't reproduce the issue so far on my system but it does seem like the dl server
>>> would get potentially unbounded running time with very frequent
>>> starting and stopping of the dlserver (which presumably happens because of
>>> the writeback) reset the runtime, which then leads to your 25s observed latency.
>>> Peter, how is the revised wakeup rule supposed to behave here?
>>>
>>>> [snip]
>>
>> This seems to be a case of runtime starvation. If I change sched_rt_runtime_us to a smaller value, the benchmark returns reasonable latency values.
>>
>> # echo "980000" > /proc/sys/kernel/sched_rt_runtime_us
>>
>> I could live with this workaround, since it seems not to impact overall latency values in a noticeable way.
>>
>
> Not a very stable workaround unfortunately :/
> While I try to reproduce this, what you're observing should imply that the
> background SCHED_NORMAL work is enough to fully utilize the system, right?
> interbench Write does 4k (buffered) writes of a 1GB file and then close+open
> and repeat, nothing fancy really. Does this actually produce significant CPU
> utilization for you? Can you just run the background work and see what that
> looks like?
> (What you're seeing looks like a bug in any case, just so I'm not going down
> a wrong path when trying to reproduce here).

I'd be interested if you can still reproduce with this fix:
https://lore.kernel.org/lkml/20260522125833.264145-1-gmonaco@xxxxxxxxxx/