Re: [PATCH v9 09/17] arm: tegra20: cpuidle: Handle case where secondary CPU hangs on entering LP2

From: Dmitry Osipenko
Date: Mon Feb 24 2020 - 10:13:02 EST


22.02.2020 00:11, Daniel Lezcano ÐÐÑÐÑ:
> On 21/02/2020 21:54, Dmitry Osipenko wrote:
>> 21.02.2020 23:48, Daniel Lezcano ÐÐÑÐÑ:
>>> On 21/02/2020 21:21, Dmitry Osipenko wrote:
>>>> 21.02.2020 23:02, Daniel Lezcano ÐÐÑÐÑ:
>>>
>>> [ ... ]
>>>
>>>>>>>>>> +
>>>>>>>>>> + /*
>>>>>>>>>> + * The primary CPU0 core shall wait for the secondaries
>>>>>>>>>> + * shutdown in order to power-off CPU's cluster safely.
>>>>>>>>>> + * The timeout value depends on the current CPU frequency,
>>>>>>>>>> + * it takes about 40-150us in average and over 1000us in
>>>>>>>>>> + * a worst case scenario.
>>>>>>>>>> + */
>>>>>>>>>> + do {
>>>>>>>>>> + if (tegra_cpu_rail_off_ready())
>>>>>>>>>> + return 0;
>>>>>>>>>> +
>>>>>>>>>> + } while (ktime_before(ktime_get(), timeout));
>>>>>>>>>
>>>>>>>>> So this loop will aggresively call tegra_cpu_rail_off_ready() and retry 3
>>>>>>>>> times. The tegra_cpu_rail_off_ready() function can be called thoushand of times
>>>>>>>>> here but the function will hang 1.5s :/
>>>>>>>>>
>>>>>>>>> I suggest something like:
>>>>>>>>>
>>>>>>>>> while (retries--i && !tegra_cpu_rail_off_ready())
>>>>>>>>> udelay(100);
>>>>>>>>>
>>>>>>>>> So <retries> calls to tegra_cpu_rail_off_ready() and 100us x <retries> maximum
>>>>>>>>> impact.
>>>>>>>> But udelay() also results into CPU spinning in a busy-loop, and thus,
>>>>>>>> what's the difference?
>>>>>>>
>>>>>>> busy looping instead of register reads with all the hardware things involved behind.
>>>>>>
>>>>>> Please notice that this code runs only on an older Cortex-A9/A15, which
>>>>>> doesn't support WFE for the delaying, and thus, CPU always busy-loops
>>>>>> inside udelay().
>>>>>>
>>>>>> What about if I'll add cpu_relax() to the loop? Do you think it it could
>>>>>> have any positive effect?
>>>>>
>>>>> I think udelay() has a call to cpu_relax().
>>>>
>>>> Yes, my point is that udelay() doesn't bring much benefit for us here
>>>> because:
>>>>
>>>> 1. we want to enter into power-gated state as quick as possible and
>>>> udelay() just adds an unnecessary delay
>>>>
>>>> 2. udelay() spins in a busy-loop until delay is expired, just like we're
>>>> doing it in this function already
>>>
>>> In this case why not remove ktime_get() and increase the number of retries?
>>
>> Because the busy-loop performance depends on CPU's frequency, so we
>> can't rely on a bare number of the retries.
>
> Why not if computed in the worst case scenario?

There are always at least a few dozens of microseconds to wait, so
something like udelay(10) should be a bit better variant anyways.

> Anyway, I'll let you give a try.
Turned out that udelay(10) is noticeably better when system is running
on a lower freqs in comparison to ktime_get(). I'll switch to udelay in
v10, thank you very much for the suggestion!