Re: [PATCH v2 00/13] sched/fair/schedutil: Better manage system response time
From: Tom Gebhardt
Date: Thu May 28 2026 - 08:56:56 EST
Hi Qais,
Thanks for the clarification on sched-analyzer -- I'll look at the perfetto
approach for task placement traces.
In the meantime, I ran `perf stat` and `perf record -g` across three kernels
at OC (2800 MHz) with `ondemand` governor, using the same stress-ng pipe
workload (4 workers, 20s).
Device: Raspberry Pi 5 (8 GB, C1-stepping, Cortex-A76), Bookworm arm64.
perf stat results:
Metric 6.6.78 7.0 stock 7.0+ttwu+vincent
------------------ --------- ---------- ----------------
bogo ops/s 2 222 639 1 855 066 2 298 965
IPC 1.72 1.47 1.76
branch-misses 625M 1 270M 1 018M
context-switches 15 145 738 22 750 121 18 905 924
cache-miss rate 1.58% 1.74% 1.38%
Key observations:
1. IPC drops 14% on 7.0 stock (1.72 -> 1.47). ttwu+vincent recovers it
almost completely (1.76, slightly above 6.6). This is a genuine
efficiency loss in the scheduler path, not a throughput/clock artifact.
2. Branch mispredictions double on 7.0 stock (+103% vs 6.6). ttwu+vincent
reduces them by ~20% vs stock, but +63% above 6.6 remains -- this
likely explains the residual ~1% gap after patching.
3. Context switches increase 50% on 7.0 stock. ttwu+vincent brings this
down to +25% vs 6.6.
perf report (-g) highlights:
On 6.6, `finish_task_switch` is barely visible in call graphs. On 7.0
(both stock and patched), it appears prominently at 5-8% of samples,
alongside elevated `_raw_spin_unlock_irqrestore` time. This points to
genuine overhead in the context switch completion path, not lock contention
between worker tasks.
Regarding the "weird contention accidentally hidden" concern: I don't see
evidence for that. The branch miss explosion and IPC drop on 7.0 stock are
consistent with more complex/harder-to-predict scheduler control flow
(EEVDF decision tree vs. CFS), not with a workload contention pattern that
happens to be masked by task placement changes. ttwu+vincent genuinely
reduces branch misses and restores IPC -- it doesn't just move the problem.
I'll try to get perfetto traces for the task placement / running vs.
runnable time breakdown. Happy to provide the raw perf.data files if
useful.
Tom