Re: [PATCH v9 09/17] arm: tegra20: cpuidle: Handle case where secondary CPU hangs on entering LP2
From: Daniel Lezcano
Date: Fri Feb 21 2020 - 10:43:26 EST
On Thu, Feb 13, 2020 at 02:51:26AM +0300, Dmitry Osipenko wrote:
> It is possible that something may go wrong with the secondary CPU, in that
> case it is much nicer to get a dump of the flow-controller state before
> hanging machine.
>
> Acked-by: Peter De Schrijver <pdeschrijver@xxxxxxxxxx>
> Tested-by: Peter Geis <pgwipeout@xxxxxxxxx>
> Tested-by: Jasper Korten <jja2000@xxxxxxxxx>
> Tested-by: David Heidelberg <david@xxxxxxx>
> Signed-off-by: Dmitry Osipenko <digetx@xxxxxxxxx>
> ---
> arch/arm/mach-tegra/cpuidle-tegra20.c | 47 +++++++++++++++++++++++++--
> 1 file changed, 45 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/mach-tegra/cpuidle-tegra20.c b/arch/arm/mach-tegra/cpuidle-tegra20.c
> index 9672c619f4bc..bcc158b72e67 100644
> --- a/arch/arm/mach-tegra/cpuidle-tegra20.c
> +++ b/arch/arm/mach-tegra/cpuidle-tegra20.c
> @@ -83,14 +83,57 @@ static inline void tegra20_wake_cpu1_from_reset(void)
> }
> #endif
>
> +static void tegra20_report_cpus_state(void)
> +{
> + unsigned long cpu, lcpu, csr;
> +
> + for_each_cpu(lcpu, cpu_possible_mask) {
> + cpu = cpu_logical_map(lcpu);
> + csr = flowctrl_read_cpu_csr(cpu);
> +
> + pr_err("cpu%lu: online=%d flowctrl_csr=0x%08lx\n",
> + cpu, cpu_online(lcpu), csr);
> + }
> +}
> +
> +static int tegra20_wait_for_secondary_cpu_parking(void)
> +{
> + unsigned int retries = 3;
> +
> + while (retries--) {
> + ktime_t timeout = ktime_add_ms(ktime_get(), 500);
Oops I missed this one. Do not use ktime_get() in this code path, use jiffies.
> +
> + /*
> + * The primary CPU0 core shall wait for the secondaries
> + * shutdown in order to power-off CPU's cluster safely.
> + * The timeout value depends on the current CPU frequency,
> + * it takes about 40-150us in average and over 1000us in
> + * a worst case scenario.
> + */
> + do {
> + if (tegra_cpu_rail_off_ready())
> + return 0;
> +
> + } while (ktime_before(ktime_get(), timeout));
So this loop will aggresively call tegra_cpu_rail_off_ready() and retry 3
times. The tegra_cpu_rail_off_ready() function can be called thoushand of times
here but the function will hang 1.5s :/
I suggest something like:
while (retries--i && !tegra_cpu_rail_off_ready())
udelay(100);
So <retries> calls to tegra_cpu_rail_off_ready() and 100us x <retries> maximum
impact.
> + pr_err("secondary CPU taking too long to park\n");
> +
> + tegra20_report_cpus_state();
> + }
> +
> + pr_err("timed out waiting secondaries to park\n");
> +
> + return -ETIMEDOUT;
> +}
> +
> static bool tegra20_cpu_cluster_power_down(struct cpuidle_device *dev,
> struct cpuidle_driver *drv,
> int index)
> {
> bool ret;
>
> - while (!tegra_cpu_rail_off_ready())
> - cpu_relax();
> + if (tegra20_wait_for_secondary_cpu_parking())
> + return false;
>
> ret = !tegra_pm_enter_lp2();
>
> --
> 2.24.0
>
--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog