Re: [PATCH v10 24/27] drivers: firmware: psci: Support CPU hotplug for the hierarchical model

From: Lina Iyer
Date: Fri Nov 30 2018 - 15:57:36 EST


On Fri, Nov 30 2018 at 01:25 -0700, Ulf Hansson wrote:
On Thu, 29 Nov 2018 at 23:31, Lina Iyer <ilina@xxxxxxxxxxxxxx> wrote:

Hi Ulf,

On Thu, Nov 29 2018 at 10:50 -0700, Ulf Hansson wrote:
>When the hierarchical CPU topology is used and when a CPU has been put
>offline (hotplug), that same CPU prevents its PM domain and thus also
>potential master PM domains, from being powered off. This is because genpd
>observes the CPU's struct device to remain being active from a runtime PM
>point of view.
>
>To deal with this, let's decrease the runtime PM usage count by calling
>pm_runtime_put_sync_suspend() of the CPU's struct device when putting it
>offline. Consequentially, we must then increase the runtime PM usage for
>the CPU, while putting it online again.
>
>Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
>---
>
>Changes in v10:
> - Make it work when the hierarchical CPU topology is used, which may be
> used both for OSI and PC mode.
> - Rework the code to prevent "BUG: sleeping function called from
> invalid context".
>---
> drivers/firmware/psci/psci.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
>diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
>index b03bccce0a5d..f62c4963eb62 100644
>--- a/drivers/firmware/psci/psci.c
>+++ b/drivers/firmware/psci/psci.c
>@@ -15,6 +15,7 @@
>
> #include <linux/acpi.h>
> #include <linux/arm-smccc.h>
>+#include <linux/cpu.h>
> #include <linux/cpuidle.h>
> #include <linux/errno.h>
> #include <linux/linkage.h>
>@@ -199,9 +200,20 @@ static int psci_cpu_suspend(u32 state, unsigned long entry_point)
>
> static int psci_cpu_off(u32 state)
> {
>+ struct device *dev;
> int err;
> u32 fn;
>
>+ /*
>+ * When the hierarchical CPU topology is used, decrease the runtime PM
>+ * usage count for the current CPU, as to allow other parts in the
>+ * topology to enter low power states.
>+ */
>+ if (psci_dt_topology) {
>+ dev = get_cpu_device(smp_processor_id());
>+ pm_runtime_put_sync_suspend(dev);
>+ }
>+
> fn = psci_function_id[PSCI_FN_CPU_OFF];
> err = invoke_psci_fn(fn, state, 0, 0);
> return psci_to_linux_errno(err);
>@@ -209,6 +221,7 @@ static int psci_cpu_off(u32 state)
>
> static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> {
>+ struct device *dev;
> int err;
> u32 fn;
>
>@@ -216,6 +229,13 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> err = invoke_psci_fn(fn, cpuid, entry_point, 0);
> /* Clear the domain state to start fresh. */
> psci_set_domain_state(0);
>+
>+ /* Increase runtime PM usage count if the hierarchical CPU toplogy. */
>+ if (!err && psci_dt_topology) {
>+ dev = get_cpu_device(cpuid);
>+ pm_runtime_get_sync(dev);

I booted with a single CPU on my SDM845 device and when I tried to
online CPU1 and I see a crash.

Thanks for testing!

If I understand correctly, that means that you haven't registered CPU1
using register_cpu(), hence there are no struct device created for it.
It sound like a special case, but on the other hand we shouldn't
crash, or course.
This infact is pretty common. Devices boot with only with low power
cores and bring in the high perf cores only when needed.

I guess a simple check like this would help.

if (dev)
pm_runtime_get_sync(dev);

...and then we need a similar check in psci_cpu_off() to deal with
putting the CPU offline.

Could you try this and see if it helps?

Yes, it fixes the issue.

Thanks,
Lina


# echo 1 > /sys/devices/system/cpu/cpu1/online

[ 86.339204] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188
[ 86.340195] Detected VIPT I-cache on CPU1
[ 86.348075] Mem abort info:
[ 86.348092] GICv3: CPU1: found redistributor 100 region 0:0x0000000017a80000
[ 86.352125] ESR = 0x96000006
[ 86.352194] CPU1: Booted secondary processor 0x0000000100 [0x517f803c]
[ 86.354956] Exception class = DABT (current EL), IL = 32 bits
[ 86.377700] SET = 0, FnV = 0
[ 86.380788] EA = 0, S1PTW = 0
[ 86.383967] Data abort info:
[ 86.386882] ISV = 0, ISS = 0x00000006
[ 86.390760] CM = 0, WnR = 0
[ 86.393755] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
[ 86.400430] [0000000000000188] pgd=00000001f5233003, pud=00000001f5234003, pmd=0000000000000000
[ 86.409203] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 86.414824] Modules linked in:
[ 86.417915] CPU: 0 PID: 1533 Comm: sh Not tainted 4.20.0-rc3-30359-gff2e21952bd5 #782
[ 86.425807] Hardware name: Qualcomm Technologies, Inc. SDM845 MTP (DT)
[ 86.432387] pstate: 80400005 (Nzcv daif +PAN -UAO)
[ 86.437233] pc : __pm_runtime_resume+0x20/0x74
[ 86.441720] lr : psci_cpu_on+0x84/0x90
[ 86.445498] sp : ffff00000db43a10
[ 86.448842] x29: ffff00000db43a10 x28: ffff80017562b500
[ 86.454200] x27: ffff000009159000 x26: 0000000000000055
[ 86.459556] x25: 0000000000000000 x24: ffff0000092c4bc8
[ 86.464913] x23: ffff000008fb8000 x22: ffff00000916a000
[ 86.470269] x21: 0000000000000100 x20: ffff000009314190
[ 86.475625] x19: 0000000000000000 x18: 0000000000000000
[ 86.480979] x17: 0000000000000000 x16: 0000000000000000
[ 86.486334] x15: 0000000000000000 x14: ffff000009162600
[ 86.491690] x13: 0000000000000300 x12: 0000000000000010
[ 86.497047] x11: ffffffffffffffff x10: ffffffffffffffff
[ 86.502399] x9 : 0000000000000001 x8 : 0000000000000000
[ 86.507753] x7 : 0000000000000000 x6 : 0000000000000000
[ 86.513108] x5 : 0000000000000000 x4 : 0000000000000000
[ 86.518463] x3 : 0000000000000188 x2 : 0000800174385000
[ 86.523820] x1 : 0000000000000004 x0 : 0000000000000000
[ 86.529175] Process sh (pid: 1533, stack limit = 0x(____ptrval____))
[ 86.535585] Call trace:
[ 86.538063] __pm_runtime_resume+0x20/0x74
[ 86.542197] psci_cpu_on+0x84/0x90
[ 86.545639] cpu_psci_cpu_boot+0x3c/0x6c
[ 86.549593] __cpu_up+0x68/0x210
[ 86.552852] bringup_cpu+0x30/0xe0
[ 86.556293] cpuhp_invoke_callback+0x84/0x1e0
[ 86.560689] _cpu_up+0xe0/0x1d0
[ 86.563862] do_cpu_up+0x90/0xb0
[ 86.567118] cpu_up+0x10/0x18
[ 86.570113] cpu_subsys_online+0x44/0x98
[ 86.574079] device_online+0x68/0xac
[ 86.577685] online_store+0xa8/0xb4
[ 86.581202] dev_attr_store+0x18/0x28
[ 86.584908] sysfs_kf_write+0x40/0x48
[ 86.588606] kernfs_fop_write+0xcc/0x1cc
[ 86.592563] __vfs_write+0x40/0x16c
[ 86.596078] vfs_write+0xa8/0x1a0
[ 86.599424] ksys_write+0x58/0xbc
[ 86.602768] __arm64_sys_write+0x18/0x20
[ 86.606733] el0_svc_common+0x94/0xf0
[ 86.610433] el0_svc_handler+0x24/0x80
[ 86.614215] el0_svc+0x8/0x7c0
[ 86.617300] Code: aa0003f3 361000e1 91062263 f9800071 (885f7c60)
[ 86.623447] ---[ end trace 4573c3c0e0761290 ]---

>+ }
>+
> return psci_to_linux_errno(err);
> }
>
>--
>2.17.1
>

Thanks,
Lina