Re: [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time

From: Qais Yousef

Date: Thu May 14 2026 - 21:43:04 EST

Hi Tom

On 05/13/26 17:09, Tom Gebhardt wrote:
> Hi Qais,
>
> I tested your v2 12/13 (sched/fair: Call update_util_est() after
> dequeue_entities()) and RFC 13/13 (sched/pelt: Always allow load updates)
> on ARM (Raspberry Pi 5, Cortex-A76, 4-core), combined with Peter
> Zijlstra's ttwu series (rebased to 7.0.y by marioroy).
>
> Both patches applied cleanly on top of rpi-7.0.y + 10 ttwu patches
> without conflicts.
>
> Results using stress-ng 0.15.06 pipe stressor (4 workers, 20s):
>
> Kernel Clock pipe bogo ops/s D vs. 6.6
> ---------------------------------- --------- ---------------- ----------
> 6.6.78-v8-16k+ 2800 MHz 2 487 746 +/-0% (ref)
> 7.0.0-v8-16k+ stock 2400 MHz 1 694 011 -31.9%
> 7.0.0-v8-16k+ stock 2800 MHz 1 851 567 -25.6%
> 7.0.0 + ttwu only (10 patches) 2400 MHz 1 836 006 -26.2%
> 7.0.0 + ttwu only (10 patches) 2800 MHz 1 934 076 -22.3%
> 7.0.0 + ttwu + your 2 Qais patches 2400 MHz 1 996 002 -19.8%
> 7.0.0 + ttwu + your 2 Qais patches 2800 MHz 2 342 144 -5.9%
>
> The ttwu-only set recovers ~3-4% of the regression on ARM. Adding your
> two patches brings a much larger improvement -- especially under
> overclocking, where the combined set recovers roughly 94% of the 6.6
> baseline. The remaining ~6% gap may be related to ARM-specific
> DELAY_DEQUEUE interactions.

Hmm this is an interesting impact. Did you get a chance to verify if you need
the 2 patches or only one of them is enough? Only 12/13 is actually a fix for
a change in behavior from 6.6. The last patch is a new addition for a behavior
that has always been there.

You have SMP system, so utilization can't be impacting your task placement to
potentially being stuck on a little core. And looking at raspberry pi code, it
seems they ship with ondemand governor as the default cpufreq governor. Are you
using the default one? Assuming yes and you're not using schedutil, then these
patches making things better is not expected.

Are you familiar with perfetto? Can you use sched-analyzer [1] to capture
a trace and inspect how the pattern changes when things are good and bad?

Output of

sched-analyzer-pp --sched-states $TASK_NAME --freq-residency-task $TASK_NAME \
sched-analyzer.perfetto-trace

would be useful to share. I suspect you have a subtle change of sched pattern
that I hope you might be able to visualize directly in ui.perfetto.dev, but the
above stats might be a good way to see potential difference between good and
bad runs.

Thanks!

[1] https://github.com/qais-yousef/sched-analyzer

>
> Device: Raspberry Pi 5 (8 GB, C1-stepping), Bookworm arm64, rpi-7.0.y.
> Background: https://github.com/raspberrypi/linux/issues/7308
>
> Thanks for the series -- the ARM results look very promising.
>
> Tom