Re: [PATCH v10 24/27] drivers: firmware: psci: Support CPU hotplug for the hierarchical model

From: Ulf Hansson
Date: Fri Nov 30 2018 - 03:25:55 EST


On Thu, 29 Nov 2018 at 23:31, Lina Iyer <ilina@xxxxxxxxxxxxxx> wrote:
>
> Hi Ulf,
>
> On Thu, Nov 29 2018 at 10:50 -0700, Ulf Hansson wrote:
> >When the hierarchical CPU topology is used and when a CPU has been put
> >offline (hotplug), that same CPU prevents its PM domain and thus also
> >potential master PM domains, from being powered off. This is because genpd
> >observes the CPU's struct device to remain being active from a runtime PM
> >point of view.
> >
> >To deal with this, let's decrease the runtime PM usage count by calling
> >pm_runtime_put_sync_suspend() of the CPU's struct device when putting it
> >offline. Consequentially, we must then increase the runtime PM usage for
> >the CPU, while putting it online again.
> >
> >Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
> >---
> >
> >Changes in v10:
> > - Make it work when the hierarchical CPU topology is used, which may be
> > used both for OSI and PC mode.
> > - Rework the code to prevent "BUG: sleeping function called from
> > invalid context".
> >---
> > drivers/firmware/psci/psci.c | 20 ++++++++++++++++++++
> > 1 file changed, 20 insertions(+)
> >
> >diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> >index b03bccce0a5d..f62c4963eb62 100644
> >--- a/drivers/firmware/psci/psci.c
> >+++ b/drivers/firmware/psci/psci.c
> >@@ -15,6 +15,7 @@
> >
> > #include <linux/acpi.h>
> > #include <linux/arm-smccc.h>
> >+#include <linux/cpu.h>
> > #include <linux/cpuidle.h>
> > #include <linux/errno.h>
> > #include <linux/linkage.h>
> >@@ -199,9 +200,20 @@ static int psci_cpu_suspend(u32 state, unsigned long entry_point)
> >
> > static int psci_cpu_off(u32 state)
> > {
> >+ struct device *dev;
> > int err;
> > u32 fn;
> >
> >+ /*
> >+ * When the hierarchical CPU topology is used, decrease the runtime PM
> >+ * usage count for the current CPU, as to allow other parts in the
> >+ * topology to enter low power states.
> >+ */
> >+ if (psci_dt_topology) {
> >+ dev = get_cpu_device(smp_processor_id());
> >+ pm_runtime_put_sync_suspend(dev);
> >+ }
> >+
> > fn = psci_function_id[PSCI_FN_CPU_OFF];
> > err = invoke_psci_fn(fn, state, 0, 0);
> > return psci_to_linux_errno(err);
> >@@ -209,6 +221,7 @@ static int psci_cpu_off(u32 state)
> >
> > static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> > {
> >+ struct device *dev;
> > int err;
> > u32 fn;
> >
> >@@ -216,6 +229,13 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> > err = invoke_psci_fn(fn, cpuid, entry_point, 0);
> > /* Clear the domain state to start fresh. */
> > psci_set_domain_state(0);
> >+
> >+ /* Increase runtime PM usage count if the hierarchical CPU toplogy. */
> >+ if (!err && psci_dt_topology) {
> >+ dev = get_cpu_device(cpuid);
> >+ pm_runtime_get_sync(dev);
>
> I booted with a single CPU on my SDM845 device and when I tried to
> online CPU1 and I see a crash.

Thanks for testing!

If I understand correctly, that means that you haven't registered CPU1
using register_cpu(), hence there are no struct device created for it.
It sound like a special case, but on the other hand we shouldn't
crash, or course.

I guess a simple check like this would help.

if (dev)
pm_runtime_get_sync(dev);

...and then we need a similar check in psci_cpu_off() to deal with
putting the CPU offline.

Could you try this and see if it helps?

>
> # echo 1 > /sys/devices/system/cpu/cpu1/online
>
> [ 86.339204] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188
> [ 86.340195] Detected VIPT I-cache on CPU1
> [ 86.348075] Mem abort info:
> [ 86.348092] GICv3: CPU1: found redistributor 100 region 0:0x0000000017a80000
> [ 86.352125] ESR = 0x96000006
> [ 86.352194] CPU1: Booted secondary processor 0x0000000100 [0x517f803c]
> [ 86.354956] Exception class = DABT (current EL), IL = 32 bits
> [ 86.377700] SET = 0, FnV = 0
> [ 86.380788] EA = 0, S1PTW = 0
> [ 86.383967] Data abort info:
> [ 86.386882] ISV = 0, ISS = 0x00000006
> [ 86.390760] CM = 0, WnR = 0
> [ 86.393755] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
> [ 86.400430] [0000000000000188] pgd=00000001f5233003, pud=00000001f5234003, pmd=0000000000000000
> [ 86.409203] Internal error: Oops: 96000006 [#1] PREEMPT SMP
> [ 86.414824] Modules linked in:
> [ 86.417915] CPU: 0 PID: 1533 Comm: sh Not tainted 4.20.0-rc3-30359-gff2e21952bd5 #782
> [ 86.425807] Hardware name: Qualcomm Technologies, Inc. SDM845 MTP (DT)
> [ 86.432387] pstate: 80400005 (Nzcv daif +PAN -UAO)
> [ 86.437233] pc : __pm_runtime_resume+0x20/0x74
> [ 86.441720] lr : psci_cpu_on+0x84/0x90
> [ 86.445498] sp : ffff00000db43a10
> [ 86.448842] x29: ffff00000db43a10 x28: ffff80017562b500
> [ 86.454200] x27: ffff000009159000 x26: 0000000000000055
> [ 86.459556] x25: 0000000000000000 x24: ffff0000092c4bc8
> [ 86.464913] x23: ffff000008fb8000 x22: ffff00000916a000
> [ 86.470269] x21: 0000000000000100 x20: ffff000009314190
> [ 86.475625] x19: 0000000000000000 x18: 0000000000000000
> [ 86.480979] x17: 0000000000000000 x16: 0000000000000000
> [ 86.486334] x15: 0000000000000000 x14: ffff000009162600
> [ 86.491690] x13: 0000000000000300 x12: 0000000000000010
> [ 86.497047] x11: ffffffffffffffff x10: ffffffffffffffff
> [ 86.502399] x9 : 0000000000000001 x8 : 0000000000000000
> [ 86.507753] x7 : 0000000000000000 x6 : 0000000000000000
> [ 86.513108] x5 : 0000000000000000 x4 : 0000000000000000
> [ 86.518463] x3 : 0000000000000188 x2 : 0000800174385000
> [ 86.523820] x1 : 0000000000000004 x0 : 0000000000000000
> [ 86.529175] Process sh (pid: 1533, stack limit = 0x(____ptrval____))
> [ 86.535585] Call trace:
> [ 86.538063] __pm_runtime_resume+0x20/0x74
> [ 86.542197] psci_cpu_on+0x84/0x90
> [ 86.545639] cpu_psci_cpu_boot+0x3c/0x6c
> [ 86.549593] __cpu_up+0x68/0x210
> [ 86.552852] bringup_cpu+0x30/0xe0
> [ 86.556293] cpuhp_invoke_callback+0x84/0x1e0
> [ 86.560689] _cpu_up+0xe0/0x1d0
> [ 86.563862] do_cpu_up+0x90/0xb0
> [ 86.567118] cpu_up+0x10/0x18
> [ 86.570113] cpu_subsys_online+0x44/0x98
> [ 86.574079] device_online+0x68/0xac
> [ 86.577685] online_store+0xa8/0xb4
> [ 86.581202] dev_attr_store+0x18/0x28
> [ 86.584908] sysfs_kf_write+0x40/0x48
> [ 86.588606] kernfs_fop_write+0xcc/0x1cc
> [ 86.592563] __vfs_write+0x40/0x16c
> [ 86.596078] vfs_write+0xa8/0x1a0
> [ 86.599424] ksys_write+0x58/0xbc
> [ 86.602768] __arm64_sys_write+0x18/0x20
> [ 86.606733] el0_svc_common+0x94/0xf0
> [ 86.610433] el0_svc_handler+0x24/0x80
> [ 86.614215] el0_svc+0x8/0x7c0
> [ 86.617300] Code: aa0003f3 361000e1 91062263 f9800071 (885f7c60)
> [ 86.623447] ---[ end trace 4573c3c0e0761290 ]---
>
> >+ }
> >+
> > return psci_to_linux_errno(err);
> > }
> >
> >--
> >2.17.1
> >
>
> Thanks,
> Lina