Re: sched/fair: DELAY_DEQUEUE causes ~25% pipe IPC regression on Raspberry Pi 5

From: Vincent Guittot

Date: Fri Apr 17 2026 - 11:49:23 EST

On Thu, 16 Apr 2026 at 14:23, Tom Gebhardt <tomge68@xxxxxxxxx> wrote:
>
> Hi Peter,
>
> I would like to report a measurable pipe IPC throughput regression introduced by
> commit 152e11f ("sched/fair: Implement delayed dequeue"), first
> present in v6.12-rc1.
>
> This has been independently confirmed on the official Raspberry Pi
> Linux issue tracker
> (raspberrypi/linux #7308), where the RPi kernel team directed the
> issue upstream.
>
>
> Hardware / Software
> -------------------
> - Raspberry Pi 5 Model B, BCM2712 (C1 stepping), 8 GB RAM
> - Raspberry Pi OS Bookworm (arm64)
> - Kernels tested: 6.6.78-v8-16k+ (rpi-6.6.y), 6.12.75+rpt-rpi-2712,
> 6.12.81-v8-16k+ (custom)
> - Benchmark: stress-ng 0.15.06, --pipe 4 --timeout 20s --metrics-brief
>
>
> Observed regression
> -------------------
> Comparing pipe IPC throughput across kernels (overclocked, arm_freq=2800):
>
> Kernel pipe bogo ops/s vs. 6.6
> 6.6.78 2 487 746 100%
> 6.12.75 1 651 427 -34%
> 6.18.21 2 049 701 -18%
>
> This regression pattern is consistent across two separate Raspberry Pi
> 5 units and has
> been independently reproduced by the RPi kernel team with 20-run averages:
> 6.6=2065 Kops/s, 6.12=1662, 6.18=1805, 7.0=1570 (lowest).
>
>
> Runtime isolation via CONFIG_SCHED_DEBUG
> -----------------------------------------
> To isolate the root cause, I compiled a custom kernel (rpi-6.12.y,
> 6.12.81-v8-16k+)
> with CONFIG_SCHED_DEBUG=y and toggled scheduler features at runtime:
>
> DELAY_DEQUEUE PREEMPT_SHORT pipe bogo ops/s vs. baseline
> on (default) on (default) 1 506 572 --
> OFF on 2 125 473 +41% <==
> on OFF 1 419 026 -6%
> OFF OFF 2 078 182 +38%
>
> Disabling DELAY_DEQUEUE alone recovers +41% throughput, almost closing
> the gap to 6.6.
> Disabling PREEMPT_SHORT alone has no positive effect on this workload.
>
> The remaining gap to 6.6 (~15%) is likely CONFIG_SCHED_DEBUG=y overhead.
>
>
> Root cause analysis
> -------------------
> The pipe producer-consumer loop is affected by DELAY_DEQUEUE as follows:
>
> Before DELAY_DEQUEUE:
> consumer reads empty pipe -> blocks -> dequeue_task() removes it from runqueue
> producer writes -> wake_up_interruptible() -> consumer re-enqueued
> cleanly -> runs
>
> With DELAY_DEQUEUE (v6.12+):
> consumer reads empty pipe -> blocks -> stays on runqueue (sched_delayed = 1)
> producer writes -> wakeup path handles already-queued task ->
> additional bookkeeping

because the task is still enqueued, the enqueue of delayed entity
should be faster most of the time

> per iteration
>
> For a tight 4-worker pipe benchmark at millions of iterations, this
> per-iteration
> overhead compounds directly into measured throughput.
>
> PREEMPT_SHORT (commit 85e511d) does not contribute to this regression.
> Its stated

Unless you set a custom slice you should not see any difference with this patch

> trade-off ("massive_intr workload gets more context switches") does
> not appear to be
> the bottleneck here.
>
>
> Mitigations tested and ruled out
> ---------------------------------
> - Spectre mitigations: mitigations=off yields only +0.5-2.5%
> improvement (confirmed by
> RPi kernel team). Not the cause.
> - CPU governor: tested with both ondemand and performance. No
> significant difference.
>
>
> References
> ----------
> - Commit 152e11f (DELAY_DEQUEUE):
> https://github.com/torvalds/linux/commit/152e11f6df293e816a6a37c69757033cdc72667d
> - Commit 85e511d (PREEMPT_SHORT):
> https://github.com/torvalds/linux/commit/85e511df3cec46021024176672a748008ed135bf
> - RPi issue tracker: https://github.com/raspberrypi/linux/issues/7308
>
> Please let me know if additional data (perf traces, full benchmark
> logs, kernel config)
> would be helpful. I am happy to run further tests on the hardware.

Could you trace and check that:
- consumer and producer of one pipe are on the same cpu
- if there is a diff in the number of migration

I will try to reproduce locally once I get access to my hardware

>
> Thank you for your work on the scheduler.
>
> Best regards,
> Thomas Gebhardt (@Kletternaut on GitHub)
>