Re: [linus:master] [timers] 7ee9887703: stress-ng.uprobe.ops_per_sec -17.1% regression

From: Christian Loehle
Date: Thu Apr 25 2024 - 06:15:27 EST


On 25/04/2024 09:23, Anna-Maria Behnsen wrote:
> Hi,
>
> (adding cpuidle/power people to cc-list)
>
> Oliver Sang <oliver.sang@xxxxxxxxx> writes:
>
>> hi, Frederic Weisbecker,
>>
>> On Tue, Apr 02, 2024 at 12:46:15AM +0200, Frederic Weisbecker wrote:
>>> Le Wed, Mar 27, 2024 at 04:39:17PM +0800, kernel test robot a écrit :
>>>>
>>>>
>>>> Hello,
>>>>
>>>>
>>>> we reported
>>>> "[tip:timers/core] [timers] 7ee9887703: netperf.Throughput_Mbps -1.2% regression"
>>>> in
>>>> https://lore.kernel.org/all/202403011511.24defbbd-oliver.sang@xxxxxxxxx/
>>>>
>>>> now we noticed this commit is in mainline and we captured further results.
>>>>
>>>> still include netperf results for complete. below details FYI.
>>>>
>>>>
>>>> kernel test robot noticed a -17.1% regression of stress-ng.uprobe.ops_per_sec
>>>> on:
>>>
>>> The good news is that I can reproduce.
>>> It has made me spot something already:
>>>
>>> https://lore.kernel.org/lkml/ZgsynV536q1L17IS@xxxxxxxxxxxxx/T/#m28c37a943fdbcbadf0332cf9c32c350c74c403b0
>>>
>>> But that's not enough to fix the regression. Investigation continues...
>>
>> Thanks a lot for information! if you want us test any patch, please let us know.
>
> Oliver, I would be happy to see, whether the patch at the end of the
> message restores the original behaviour also in your test setup. I
> applied it on 6.9-rc4. This patch is not a fix - it is just a pointer to
> the kernel path, that might cause the regression. I know, it is
> probable, that a warning in tick_sched is triggered. This happens when
> the first timer is alredy in the past. I didn't add an extra check when
> creating the 'defacto' timer thingy. But existing code handles this
> problem already properly. So the warning could be ignored here.
>
> For the cpuidle people, let me explain what I oberserved, my resulting
> assumption and my request for help:
>
> cpuidle governors use expected sleep length values (beside other data)
> to decide which idle state would be good to enter. The expected sleep
> length takes the first queued timer of the CPU into account and is
> provided by tick_nohz_get_sleep_length(). With the timer pull model in
> place the non pinned timers are not taken into account when there are
> other CPUs up and running which could handle those timers. This could
> lead to increased sleep length values. On my system during the stress-ng
> uprobes test it was in the range of maximum 100us without the patch set
> and with the patch set the maximum was in a range of 200sec. This is
> intended behaviour, because timers which could expire on any CPU should
> expire on the CPU which is busy anyway and the non busy CPU should be
> able to go idle.
>
> Those increased sleep length values were the only anomalies I could find
> in the traces with the regression.
>
> I created the patch below which simply fakes the sleep length values
> that they take all timers of the CPU into account (also the non
> pinned). This patch kind of restores the behavoir of
> tick_nohz_get_sleep_length() before the change but still with the timer
> pull model in place.
>
> With the patch the regression was gone, at least on my system (using
> cpuidle governor menu but also teo).

I assume the regression is reproducible for both?
(The original report is using menu for anyone else looking at this)

>
> So my assumption here is, that cpuidle governors assume that a deeper
> idle state could be choosen and selecting the deeper idle state makes an
> overhead when returning from idle. But I have to notice here, that I'm
> still not familiar with cpuidle internals... So I would be happy about
> some hints how I can debug/trace cpuidle internals to falsify or verify
> this assumption.

I'd say that sounds correct.
Comparing cpu_idle_miss would be interesting for both.

Regards,
Christian

> [snip]