Re: [PATCH 0/6] cpuidle: teo: fixes and improvements

From: Rafael J. Wysocki
Date: Thu Jun 06 2024 - 08:29:42 EST


On Thu, Jun 6, 2024 at 1:59 PM Christian Loehle
<christian.loehle@xxxxxxx> wrote:
>
> On 6/6/24 10:00, Christian Loehle wrote:
> > Hi all,
> > so my investigation into teo lead to the following fixes and
> > improvements. Logically they are mostly independent, that's why this
> > cover letter is quite short, details are in the patches.
> >
> > 1/6:
> > As discussed, the utilization threshold is too high, while
> > there are benefits in certain workloads, there are quite a few
> > regressions, too.
> > 2/6:
> > Especially with the new util threshold, stopping tick makes little
> > sense when utilized is detected, so don't.
> > 3/6:
> > Particularly with WFI, even if it's the only state, stopping the tick
> > has benefits, so enable that in the early bail out.
> > 4/6:
> > Stopping the tick with 0 cost (if the idle state dictates it) is too
> > aggressive IMO, so add 1ms constant cost.
> > XXX: This has the issue of now being counted as idle_miss, so we could
> > consider adding this to the states, too, but the simple implementation
> > of this would have the downside that the cost is added to deeper states
> > even if the tick is already off.
> > 5/6:
> > Remove the 'recent' intercept logic, see my findings in:
> > https://lore.kernel.org/lkml/0ce2d536-1125-4df8-9a5b-0d5e389cd8af@xxxxxxx/
> > I haven't found a way to salvage this properly, so I removed it.
> > The regular intercept seems to decay fast enough to not need this, but
> > we could change it if that turns out to be the case.
> > 6/6:
> > The rest of the intercept logic had issues, too.
> > See the commit.
> >
> > TODO: add some measurements of common workloads and some simple sanity
> > tests (like Vincent described in low utilization workloads if the
> > state selection looks reasonable).
> > I have some, but more (and more standardized) would be beneficial.
> >
> > Happy for anyone to take a look and test as well.
> >
> > Some numbers for context:
> > Maybe some numbers for context, I'll probably add them to the cover letter.
> >
> > Comparing:
> > - IO workload (intercept heavy).
> > - Timer workload very low utilization (check for deepest state)
> > - hackbench (high utilization)
> > all on RK3399 with CONFIG_HZ=100.
> > target_residencies: 1, 900, 2000
> >
> > 1. IO workload, 5 runs, results sorted, in read IOPS.
> > fio --minimal --time_based --name=fiotest --filename=/dev/nvme0n1 --runtime=30 --rw=randread --bs=4k --ioengine=psync --iodepth=1 --direct=1 | cut -d \; -f 8;
> >
> > teo fixed:
> > /dev/nvme0n1
> > [4597, 4673, 4727, 4741, 4756]
> > /dev/mmcblk2
> > [5753, 5832, 5837, 5911, 5949]
> > /dev/mmcblk1
> > [2059, 2062, 2070, 2071, 2080]
> >
> > teo mainline:
> > /dev/nvme0n1
> > [3793, 3825, 3846, 3865, 3964]
> > /dev/mmcblk2
> > [3831, 4110, 4154, 4203, 4228]
> > /dev/mmcblk1
> > [1559, 1564, 1596, 1611, 1618]
> >
> > menu:
> > /dev/nvme0n1
> > [2571, 2630, 2804, 2813, 2917]
> > /dev/mmcblk2
> > [4181, 4260, 5062, 5260, 5329]
> > /dev/mmcblk1
> > [1567, 1581, 1585, 1603, 1769]
> >
> > 2. Timer workload (through IO for my convenience ;) )
> > Results in read IOPS, fio same as above.
> > echo "0 2097152 zero" | dmsetup create dm-zeros
> > echo "0 2097152 delay /dev/mapper/dm-zeros 0 50" | dmsetup create dm-slow
> > (Each IO is delayed by timer of 50ms, should be mostly in state2)
> >
> > teo fixed:
> > 3269 cpu_idle total
> > 48 cpu_idle_miss
> > 30 cpu_idle_miss above
> > 18 cpu_idle_miss below
> >
> > teo mainline:
> > 3221 cpu_idle total
> > 1269 cpu_idle_miss
> > 22 cpu_idle_miss above
> > 1247 cpu_idle_miss below
> >
> > menu:
> > 3433 cpu_idle total
> > 114 cpu_idle_miss
> > 61 cpu_idle_miss above
> > 53 cpu_idle_miss below
> >
> > Residencies:
>
> Hmm, maybe actually including them would've been helpful too:
> (Over 5s workload, only showing LITTLE cluster)
> teo fixed:
> idle_state
> 2.0 4.813378
> -1.0 0.210820

Just to clarify, what does -1.0 mean here?

> 1.0 0.202778
> 0.0 0.062426
>
> teo mainline:
> idle_state
> 1.0 4.895766
> -1.0 0.098063
> 0.0 0.253069
>
> menu:
> idle_state
> 2.0 4.528356
> -1.0 0.241486
> 1.0 0.345829
> 0.0 0.202505
>
> >
> > tldr: overall teo fixed spends more time in state2 while having
> > fewer idle_miss than menu.
> > teo mainline was just way too aggressive at selecting shallow states.
> >
> > 3. Hackbench, 5 runs
> > for i in $(seq 0 4); do hackbench -l 100 -g 100 ; sleep 1; done
> >
> > teo fixed:
> > Time: 4.807
> > Time: 4.856
> > Time: 5.072
> > Time: 4.934
> > Time: 4.962
> >
> > teo mainline:
> > Time: 4.945
> > Time: 5.021
> > Time: 4.927
> > Time: 4.923
> > Time: 5.137
> >
> > menu:
> > Time: 4.991
> > Time: 4.884
> > Time: 4.880
> > Time: 4.946
> > Time: 4.980
> >
> > tldr: all comparable, teo mainline slightly worse
> >
> > Kind Regards,
> > Christian
> >
> > Christian Loehle (6):
> > cpuidle: teo: Increase util-threshold
> > cpuidle: teo: Don't stop tick on utilized
> > cpuidle: teo: Don't always stop tick on one state
> > cpuidle: teo: Increase minimum time to stop tick
> > cpuidle: teo: Remove recent intercepts metric
> > cpuidle: teo: Don't count non-existent intercepts
> >
> > drivers/cpuidle/governors/teo.c | 121 +++++++++++++-------------------
> > 1 file changed, 48 insertions(+), 73 deletions(-)
> >
> > --
> > 2.34.1
> >
>
>