Re: sched/deadline: Use revised wakeup rule for dl_server
From: Christian Loehle
Date: Mon May 25 2026 - 03:27:02 EST
On 5/11/26 10:47, Christian Loehle wrote:
> On 5/9/26 12:42, Andreas Ziegler wrote:
>> Hi Christian, Everyone,
>>
>> On 2026-05-08 14:13, Christian Loehle wrote:
>>> On 5/8/26 13:06, Andreas Ziegler wrote:
>>>> Hi Christian,
>>>>
>>>> On 2026-05-08 09:20, Christian Loehle wrote:
>>>>> On 5/8/26 09:09, Andreas Ziegler wrote:
>>>>>> Linux kernel version: 6.12
>>>>>> CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
>>>>>> Architecture: aarch64
>>>>>> Platform: Raspberry Pi 4
>>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Commit d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) [1] introduced a marked degradation in scheduling latency for real-time tasks in the presence of heavy I/O load.
>>>>>>
>>>>>> --- a/kernel/sched/deadline.c
>>>>>> +++ b/kernel/sched/deadline.c
>>>>>> @@ -1079,7 +1079,7 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
>>>>>> if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
>>>>>> dl_entity_overflow(dl_se, rq_clock(rq))) {
>>>>>>
>>>>>> - if (unlikely(!dl_is_implicit(dl_se) &&
>>>>>> + if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) &&
>>>>>> !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
>>>>>> !is_dl_boosted(dl_se))) {
>>>>>> update_dl_revised_wakeup(dl_se, rq);
>>>>>>
>>>>>> This was observed using a modified version of Con Kolivas' interactivity benchmark [2]; kernel bisection eventually pointed to the above mentioned commit.
>>>>>>
>>>>>> Benchmark results before d66792919d4f:
>>>>>>
>>>>>> --- Benchmarking simulated cpu of Audio real time in the presence of simulated ---
>>>>>> Load Latency +/- SD median max [100n] Desired CPU Deadlines met [%]
>>>>>> None 76.6 +/- 8.3654 76 166
>>>>>> Video 78.5 +/- 3.9433 78 107
>>>>>> X 76.4 +/- 8.123 75 157
>>>>>> Burn 72.0 +/- 6.4733 71 127
>>>>>> Write 255.3 +/- 26.627 252 331
>>>>>> Read 226.6 +/- 12.38 227 262
>>>>>> Ring 84.2 +/- 6.6207 83 125
>>>>>> Compile 225.3 +/- 23.949 222 328
>>>>>>
>>>>>> 136.8 +/- 78.462 331
>>>>>>
>>>>>> Benchmark results after d66792919d4f:
>>>>>>
>>>>>> --- Benchmarking simulated cpu of Audio real time in the presence of simulated ---
>>>>>> Load Latency +/- SD median max [100n] Desired CPU Deadlines met [%]
>>>>>> None 68.4 +/- 9.7864 67 169
>>>>>> Video 74.4 +/- 3.724 74 97
>>>>>> X 72.0 +/- 6.5681 71 129
>>>>>> Burn 66.9 +/- 5.9059 66 117
>>>>>> Write 9576.9 +/- 67639 250500418 98.1 98.1
>>>>>> Read 209.3 +/- 11.018 209 267
>>>>>> Ring 80.5 +/- 8.0993 78 125
>>>>>> Compile 239.0 +/- 29.447 234 372
>>>>>>
>>>>>> 1298.4 +/- 24118 500418
>>>>>>
>>>>>> Reverting this commit obviously solves the issue for me. I have no idea why this issue appears exclusively with heavy write loads in the background.
>>>>>>
>>>>>> Is this a scheduler issue, or rather something in the background?
>>>>>>
>>>>>
>>>>> Hi Andreas,
>>>>> You're using cpufreq schedutil for your tests I'm assuming?
>>>>> Is there a difference in cpufreq behavior (avg cpufreq or OPP residencies?)
>>>>> Does the regression also happen on powersave/performance governor?
>>>>
>>>> Actually this is a very stripped-down system. The 'performance' cpufreq governor is the only one compiled in, the processor cores run on a fixed frequency. CONFIG_PM_OPP is not set.
>>>
>>> That certainly makes the analysis easier.
>>> I couldn't reproduce the issue so far on my system but it does seem like the dl server
>>> would get potentially unbounded running time with very frequent
>>> starting and stopping of the dlserver (which presumably happens because of
>>> the writeback) reset the runtime, which then leads to your 25s observed latency.
>>> Peter, how is the revised wakeup rule supposed to behave here?
>>>
>>>> [snip]
>>
>> This seems to be a case of runtime starvation. If I change sched_rt_runtime_us to a smaller value, the benchmark returns reasonable latency values.
>>
>> # echo "980000" > /proc/sys/kernel/sched_rt_runtime_us
>>
>> I could live with this workaround, since it seems not to impact overall latency values in a noticeable way.
>>
>
> Not a very stable workaround unfortunately :/
> While I try to reproduce this, what you're observing should imply that the
> background SCHED_NORMAL work is enough to fully utilize the system, right?
> interbench Write does 4k (buffered) writes of a 1GB file and then close+open
> and repeat, nothing fancy really. Does this actually produce significant CPU
> utilization for you? Can you just run the background work and see what that
> looks like?
> (What you're seeing looks like a bug in any case, just so I'm not going down
> a wrong path when trying to reproduce here).
I'd be interested if you can still reproduce with this fix:
https://lore.kernel.org/lkml/20260522125833.264145-1-gmonaco@xxxxxxxxxx/