Re: [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time
From: Tom Gebhardt
Date: Fri May 15 2026 - 04:25:10 EST
Hi Qais,
Thanks for the follow-up. Here are the patch isolation results and answers to your questions.
Regarding the governor:
Yes, I'm running `ondemand`, not `schedutil`. My mistake for not mentioning that upfront - I
assumed the improvement was due to the util_est path being triggered regardless of the governor.
The improvement is clearly measurable even with `ondemand`, which is surprising given that your
patches specifically target `schedutil`.
Patch isolation -- 12/13 only vs. both:
I re-ran the benchmarks with patch 13/13 (`sched/pelt: Always allow load updates`) reverted,
keeping only patch 12/13 (`sched/fair: Call update_util_est() after dequeue_entities()`).
Results using stress-ng 0.15.06 pipe stressor (4 workers, 20s):
Kernel Clock pipe bogo ops/s delta vs. 6.6
----------------------------------- -------- ---------------- -------------
6.6.78-v8-16k+ 2400 MHz 2 129 330 +/-0% (ref)
6.6.78-v8-16k+ 2800 MHz 2 487 746 +/-0% (ref)
7.0.0-v8-16k+ stock 2400 MHz 1 694 011 -20.5%
7.0.0-v8-16k+ stock 2800 MHz 1 851 567 -25.6%
7.0.0 + ttwu only (10 patches) 2400 MHz 1 836 006 -13.8%
7.0.0 + ttwu only (10 patches) 2800 MHz 1 934 076 -22.3%
7.0.0 + ttwu + patch 12/13 only 2400 MHz 2 054 879 -3.5%
7.0.0 + ttwu + patch 12/13 only 2800 MHz 2 415 617 -2.9%
7.0.0 + ttwu + patches 12+13 2400 MHz 1 996 002 -6.3%
7.0.0 + ttwu + patches 12+13 2800 MHz 2 342 144 -5.9%
The key finding: patch 12/13 alone outperforms the combined set on ARM. Adding patch 13/13
actually hurts performance slightly -- about 3 percentage points -- at both clock speeds. This
suggests that `sched/pelt: Always allow load updates` has a negative interaction on ARM/Cortex-A76,
possibly related to how PELT decay is handled without `schedutil` active, or an ARM-specific
DELAY_DEQUEUE interaction.
Patch 12/13 alone closes the gap to just -2.9% vs. 6.6 at 2800 MHz (OC), and -3.5% at nominal
2400 MHz. That is a remarkable recovery from the -31.9% regression in 7.0 stock.
Regarding Perfetto traces:
Unfortunately I cannot provide sched-analyzer traces at this time -- the kernel is not compiled
with CONFIG_DEBUG_INFO_BTF=y (pahole/dwarves not available in this build environment), which
is required for BPF CO-RE. I can try to arrange that for a future run if it would still be useful.
Device: Raspberry Pi 5 (8 GB, C1-stepping), Bookworm arm64, kernel rpi-7.0.y.
Background: https://github.com/raspberrypi/linux/issues/7308
Tom