Re: [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time
From: Tom Gebhardt
Date: Wed May 13 2026 - 11:16:58 EST
Hi Qais,
I tested your v2 12/13 (sched/fair: Call update_util_est() after
dequeue_entities()) and RFC 13/13 (sched/pelt: Always allow load updates)
on ARM (Raspberry Pi 5, Cortex-A76, 4-core), combined with Peter
Zijlstra's ttwu series (rebased to 7.0.y by marioroy).
Both patches applied cleanly on top of rpi-7.0.y + 10 ttwu patches
without conflicts.
Results using stress-ng 0.15.06 pipe stressor (4 workers, 20s):
Kernel Clock pipe bogo ops/s D vs. 6.6
---------------------------------- --------- ---------------- ----------
6.6.78-v8-16k+ 2800 MHz 2 487 746 +/-0% (ref)
7.0.0-v8-16k+ stock 2400 MHz 1 694 011 -31.9%
7.0.0-v8-16k+ stock 2800 MHz 1 851 567 -25.6%
7.0.0 + ttwu only (10 patches) 2400 MHz 1 836 006 -26.2%
7.0.0 + ttwu only (10 patches) 2800 MHz 1 934 076 -22.3%
7.0.0 + ttwu + your 2 Qais patches 2400 MHz 1 996 002 -19.8%
7.0.0 + ttwu + your 2 Qais patches 2800 MHz 2 342 144 -5.9%
The ttwu-only set recovers ~3-4% of the regression on ARM. Adding your
two patches brings a much larger improvement -- especially under
overclocking, where the combined set recovers roughly 94% of the 6.6
baseline. The remaining ~6% gap may be related to ARM-specific
DELAY_DEQUEUE interactions.
Device: Raspberry Pi 5 (8 GB, C1-stepping), Bookworm arm64, rpi-7.0.y.
Background: https://github.com/raspberrypi/linux/issues/7308
Thanks for the series -- the ARM results look very promising.
Tom