Re: sched/deadline: Use revised wakeup rule for dl_server

From: Andreas Ziegler

Date: Mon May 11 2026 - 08:41:08 EST


On 2026-05-11 09:47, Christian Loehle wrote:
On 5/9/26 12:42, Andreas Ziegler wrote:
Hi Christian, Everyone,

On 2026-05-08 14:13, Christian Loehle wrote:
On 5/8/26 13:06, Andreas Ziegler wrote:
Hi Christian,

On 2026-05-08 09:20, Christian Loehle wrote:
On 5/8/26 09:09, Andreas Ziegler wrote:
Linux kernel version: 6.12
  CONFIG_PREEMPT_RT (w/ PREEMPT_RT patch applied)
Architecture: aarch64
Platform: Raspberry Pi 4

Hi everyone,

Commit d66792919d4f (sched/deadline: Use revised wakeup rule for dl_server) [1] introduced a marked degradation in scheduling latency for real-time tasks in the presence of heavy I/O load.

--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1079,7 +1079,7 @@ static void update_dl_entity(struct sched_dl_entity *dl_se)
     if (dl_time_before(dl_se->deadline, rq_clock(rq)) ||
         dl_entity_overflow(dl_se, rq_clock(rq))) {

-        if (unlikely(!dl_is_implicit(dl_se) &&
+        if (unlikely((!dl_is_implicit(dl_se) || dl_se->dl_defer) &&
                  !dl_time_before(dl_se->deadline, rq_clock(rq)) &&
                  !is_dl_boosted(dl_se))) {
             update_dl_revised_wakeup(dl_se, rq);

This was observed using a modified version of Con Kolivas' interactivity benchmark [2]; kernel bisection eventually pointed to the above mentioned commit.

Benchmark results before d66792919d4f:

--- Benchmarking simulated cpu of Audio real time in the presence of simulated ---
Load    Latency +/- SD   median  max [100n]    Desired CPU  Deadlines met [%]
None      76.6 +/- 8.3654    76  166
Video      78.5 +/- 3.9433    78  107
X      76.4 +/- 8.123     75  157
Burn      72.0 +/- 6.4733    71  127
Write     255.3 +/- 26.627   252  331
Read     226.6 +/- 12.38    227  262
Ring      84.2 +/- 6.6207    83  125
Compile     225.3 +/- 23.949   222  328

     136.8 +/- 78.462        331

Benchmark results after d66792919d4f:

--- Benchmarking simulated cpu of Audio real time in the presence of simulated ---
Load    Latency +/- SD   median  max [100n]    Desired CPU  Deadlines met [%]
None      68.4 +/- 9.7864    67  169
Video      74.4 +/- 3.724     74   97
X      72.0 +/- 6.5681    71  129
Burn      66.9 +/- 5.9059    66  117
Write    9576.9 +/- 67639    250500418        98.1         98.1
Read     209.3 +/- 11.018   209  267
Ring      80.5 +/- 8.0993    78  125
Compile     239.0 +/- 29.447   234  372

    1298.4 +/- 24118       500418

Reverting this commit obviously solves the issue for me. I have no idea why this issue appears exclusively with heavy write loads in the background.

Is this a scheduler issue, or rather something in the background?


Hi Andreas,
You're using cpufreq schedutil for your tests I'm assuming?
Is there a difference in cpufreq behavior (avg cpufreq or OPP residencies?)
Does the regression also happen on powersave/performance governor?

Actually this is a very stripped-down system. The 'performance' cpufreq governor is the only one compiled in, the processor cores run on a fixed frequency. CONFIG_PM_OPP is not set.

That certainly makes the analysis easier.
I couldn't reproduce the issue so far on my system but it does seem like the dl server
would get potentially unbounded running time with very frequent
starting and stopping of the dlserver (which presumably happens because of
the writeback) reset the runtime, which then leads to your 25s observed latency.
Peter, how is the revised wakeup rule supposed to behave here?

[snip]

This seems to be a case of runtime starvation. If I change sched_rt_runtime_us to a smaller value, the benchmark returns reasonable latency values.

# echo "980000" > /proc/sys/kernel/sched_rt_runtime_us

I could live with this workaround, since it seems not to impact overall latency values in a noticeable way.


Not a very stable workaround unfortunately :/
While I try to reproduce this, what you're observing should imply that the
background SCHED_NORMAL work is enough to fully utilize the system, right?
interbench Write does 4k (buffered) writes of a 1GB file and then close+open
and repeat, nothing fancy really. Does this actually produce significant CPU
utilization for you? Can you just run the background work and see what that
looks like?
(What you're seeing looks like a bug in any case, just so I'm not going down
a wrong path when trying to reproduce here).

You are right, and this was a false positive; the problem seems to be intermittent (maybe 1/20) and I just got lucky for one session.

Some background information about the current state of the system:
/* CONFIG_CPU_FREQ is not set */
Root filesystem in RAM (initrd)
Cpu 3 is isolated: boot parameters: console=tty1 console=ttyAMA0,115200 isolcpus=nohz,domain,managed_irq,3 nohz_full=3 rcu_nocbs=3

Background load is normally near 100% idle; this is from top after reboot:

Mem: 95724K used, 853524K free, 42408K shrd, 72K buff, 43352K cached
CPU: 0.0% usr 0.0% sys 0.0% nic 100% idle 0.0% io 0.0% irq 0.0% sirq
Load average: 0.21 0.17 0.07 3/126 702

The file size used by interbench is even less than 1GB, due to the limits of the rootfs. Typical values are around 100-200 MiB. It is written in an infinite loop until receiving the stop message (via pipe) from the controlling process. The check for the abort signal occurs after a completed write, not on block level.

I just noticed that interbench seems to have a bug itself: it uses only one processor - looks like a mangled cpu mask. Top output during the write benchmark:

Mem: 358024K used, 591224K free, 298516K shrd, 2504K buff, 299464K cached
CPU: 1.8% usr 23.1% sys 0.0% nic 74.9% idle 0.0% io 0.0% irq 0.0% sirq
Load average: 1.21 0.46 0.29 5/129 2116
PID PPID USER STAT VSZ %VSZ CPU %CPU COMMAND
2106 2105 root S 1228 0.1 0 23.6 interbench -r -t 60 -u -w Write -W
2109 2105 root S 1228 0.1 0 1.2 interbench -r -t 60 -u -w Write -W
1829 1274 root R 1600 0.1 2 0.0 top -d 5
22 2 root SW 0 0.0 0 0.0 [rcuc/0]
1270 2 root IW 0 0.0 0 0.0 [kworker/0:0-eve]
652 1 mpd S 27632 2.9 0 0.0 /usr/bin/mpd
2023 2021 root S 4476 0.4 0 0.0 sshd-session: root@notty
675 673 root S 4448 0.4 1 0.0 sshd-session: root@pts/0
673 601 root S 4140 0.4 0 0.0 sshd-session: root [priv]
2021 601 root S 4140 0.4 0 0.0 sshd-session: root [priv]
601 1 root S 3736 0.3 1 0.0 sshd: /usr/sbin/sshd [listener] 0
2024 2023 root S 3224 0.3 1 0.0 /usr/libexec/sftp-server
2025 2023 root S 3188 0.3 2 0.0 /usr/libexec/sftp-server
501 1 root S 1884 0.2 1 0.0 /usr/sbin/wpa_supplicant -B -P /va
131 1 root S 1672 0.1 0 0.0 /sbin/mdev -df
676 675 root S 1636 0.1 1 0.0 -sh
1274 605 root S 1636 0.1 1 0.0 -sh
605 1 root S 1592 0.1 1 0.0 /usr/sbin/telnetd -F
527 1 root S 1576 0.1 2 0.0 udhcpc -t1 -A2 -b -R -O search -O
1 0 root S 1576 0.1 0 0.0 init

I tried limiting interbench's rather excessive SCHED_FIFO priorities to values normal for the system, but without success.