Suspect broken frequency transitions on SDM845

From: Valentin Schneider
Date: Tue Feb 04 2020 - 07:53:30 EST


Hi folks,

We have a simple sanity test that asserts higher frequency leads to more
work done. It's fairly straightforward - we use the userspace governor,
go through increasing frequencies, run sysbench each time and assert the
values we get are increasing monotonically. We do that for one CPU of each
"type" (i.e. once for a LITTLE and once for a big).

We've been getting some sporadic failures on the big CPUs of a Pixel3
running mainline [1], here is an example of a correct run (CPU4):

| frequency (kHz) | sysbench events |
|-----------------+-----------------|
| 825600 | 236 |
| 1286400 | 369 |
| 1689600 | 483 |
| 2092800 | 600 |
| 2476800 | 711 |

and here is a failed one (still CPU4):

| frequency (kHz) | sysbench events |
|-----------------+-----------------|
| 825600 | 234 |
| 1286400 | 369 |
| 1689600 | 449 |
| 2092800 | 600 |
| 2476800 | 355 |


We've encountered something like this in the past with the exact same
test on h960 [2] but it is much harder to reproduce reliably this time
around.

I haven't found much time to dig into this; I did get a run of ~100
iterations with about ~15 failures, but nothing cpufreq related showed up in
dmesg. I briefly suspected fast-switch, but it's only used by schedutil, so
in this test I would expect the frequency transition to be complete before we
even try to start executing sysbench.

If anyone has the time and will to look into this, that would be much
appreciated.

[1]: https://git.linaro.org/people/amit.pundir/linux.git/log/?h=blueline-mainline-tracking
[2]: https://lore.kernel.org/lkml/d3ede0ab-b635-344c-faba-a9b1531b7f05@xxxxxxx/

Cheers,
Valentin