On Thu, 29 Nov 2018 at 23:31, Lina Iyer <ilina@xxxxxxxxxxxxxx> wrote:This infact is pretty common. Devices boot with only with low power
Hi Ulf,
On Thu, Nov 29 2018 at 10:50 -0700, Ulf Hansson wrote:
>When the hierarchical CPU topology is used and when a CPU has been put
>offline (hotplug), that same CPU prevents its PM domain and thus also
>potential master PM domains, from being powered off. This is because genpd
>observes the CPU's struct device to remain being active from a runtime PM
>point of view.
>
>To deal with this, let's decrease the runtime PM usage count by calling
>pm_runtime_put_sync_suspend() of the CPU's struct device when putting it
>offline. Consequentially, we must then increase the runtime PM usage for
>the CPU, while putting it online again.
>
>Signed-off-by: Ulf Hansson <ulf.hansson@xxxxxxxxxx>
>---
>
>Changes in v10:
> - Make it work when the hierarchical CPU topology is used, which may be
> used both for OSI and PC mode.
> - Rework the code to prevent "BUG: sleeping function called from
> invalid context".
>---
> drivers/firmware/psci/psci.c | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
>
>diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
>index b03bccce0a5d..f62c4963eb62 100644
>--- a/drivers/firmware/psci/psci.c
>+++ b/drivers/firmware/psci/psci.c
>@@ -15,6 +15,7 @@
>
> #include <linux/acpi.h>
> #include <linux/arm-smccc.h>
>+#include <linux/cpu.h>
> #include <linux/cpuidle.h>
> #include <linux/errno.h>
> #include <linux/linkage.h>
>@@ -199,9 +200,20 @@ static int psci_cpu_suspend(u32 state, unsigned long entry_point)
>
> static int psci_cpu_off(u32 state)
> {
>+ struct device *dev;
> int err;
> u32 fn;
>
>+ /*
>+ * When the hierarchical CPU topology is used, decrease the runtime PM
>+ * usage count for the current CPU, as to allow other parts in the
>+ * topology to enter low power states.
>+ */
>+ if (psci_dt_topology) {
>+ dev = get_cpu_device(smp_processor_id());
>+ pm_runtime_put_sync_suspend(dev);
>+ }
>+
> fn = psci_function_id[PSCI_FN_CPU_OFF];
> err = invoke_psci_fn(fn, state, 0, 0);
> return psci_to_linux_errno(err);
>@@ -209,6 +221,7 @@ static int psci_cpu_off(u32 state)
>
> static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> {
>+ struct device *dev;
> int err;
> u32 fn;
>
>@@ -216,6 +229,13 @@ static int psci_cpu_on(unsigned long cpuid, unsigned long entry_point)
> err = invoke_psci_fn(fn, cpuid, entry_point, 0);
> /* Clear the domain state to start fresh. */
> psci_set_domain_state(0);
>+
>+ /* Increase runtime PM usage count if the hierarchical CPU toplogy. */
>+ if (!err && psci_dt_topology) {
>+ dev = get_cpu_device(cpuid);
>+ pm_runtime_get_sync(dev);
I booted with a single CPU on my SDM845 device and when I tried to
online CPU1 and I see a crash.
Thanks for testing!
If I understand correctly, that means that you haven't registered CPU1
using register_cpu(), hence there are no struct device created for it.
It sound like a special case, but on the other hand we shouldn't
crash, or course.
Yes, it fixes the issue.
I guess a simple check like this would help.
if (dev)
pm_runtime_get_sync(dev);
...and then we need a similar check in psci_cpu_off() to deal with
putting the CPU offline.
Could you try this and see if it helps?
# echo 1 > /sys/devices/system/cpu/cpu1/online
[ 86.339204] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000188
[ 86.340195] Detected VIPT I-cache on CPU1
[ 86.348075] Mem abort info:
[ 86.348092] GICv3: CPU1: found redistributor 100 region 0:0x0000000017a80000
[ 86.352125] ESR = 0x96000006
[ 86.352194] CPU1: Booted secondary processor 0x0000000100 [0x517f803c]
[ 86.354956] Exception class = DABT (current EL), IL = 32 bits
[ 86.377700] SET = 0, FnV = 0
[ 86.380788] EA = 0, S1PTW = 0
[ 86.383967] Data abort info:
[ 86.386882] ISV = 0, ISS = 0x00000006
[ 86.390760] CM = 0, WnR = 0
[ 86.393755] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
[ 86.400430] [0000000000000188] pgd=00000001f5233003, pud=00000001f5234003, pmd=0000000000000000
[ 86.409203] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[ 86.414824] Modules linked in:
[ 86.417915] CPU: 0 PID: 1533 Comm: sh Not tainted 4.20.0-rc3-30359-gff2e21952bd5 #782
[ 86.425807] Hardware name: Qualcomm Technologies, Inc. SDM845 MTP (DT)
[ 86.432387] pstate: 80400005 (Nzcv daif +PAN -UAO)
[ 86.437233] pc : __pm_runtime_resume+0x20/0x74
[ 86.441720] lr : psci_cpu_on+0x84/0x90
[ 86.445498] sp : ffff00000db43a10
[ 86.448842] x29: ffff00000db43a10 x28: ffff80017562b500
[ 86.454200] x27: ffff000009159000 x26: 0000000000000055
[ 86.459556] x25: 0000000000000000 x24: ffff0000092c4bc8
[ 86.464913] x23: ffff000008fb8000 x22: ffff00000916a000
[ 86.470269] x21: 0000000000000100 x20: ffff000009314190
[ 86.475625] x19: 0000000000000000 x18: 0000000000000000
[ 86.480979] x17: 0000000000000000 x16: 0000000000000000
[ 86.486334] x15: 0000000000000000 x14: ffff000009162600
[ 86.491690] x13: 0000000000000300 x12: 0000000000000010
[ 86.497047] x11: ffffffffffffffff x10: ffffffffffffffff
[ 86.502399] x9 : 0000000000000001 x8 : 0000000000000000
[ 86.507753] x7 : 0000000000000000 x6 : 0000000000000000
[ 86.513108] x5 : 0000000000000000 x4 : 0000000000000000
[ 86.518463] x3 : 0000000000000188 x2 : 0000800174385000
[ 86.523820] x1 : 0000000000000004 x0 : 0000000000000000
[ 86.529175] Process sh (pid: 1533, stack limit = 0x(____ptrval____))
[ 86.535585] Call trace:
[ 86.538063] __pm_runtime_resume+0x20/0x74
[ 86.542197] psci_cpu_on+0x84/0x90
[ 86.545639] cpu_psci_cpu_boot+0x3c/0x6c
[ 86.549593] __cpu_up+0x68/0x210
[ 86.552852] bringup_cpu+0x30/0xe0
[ 86.556293] cpuhp_invoke_callback+0x84/0x1e0
[ 86.560689] _cpu_up+0xe0/0x1d0
[ 86.563862] do_cpu_up+0x90/0xb0
[ 86.567118] cpu_up+0x10/0x18
[ 86.570113] cpu_subsys_online+0x44/0x98
[ 86.574079] device_online+0x68/0xac
[ 86.577685] online_store+0xa8/0xb4
[ 86.581202] dev_attr_store+0x18/0x28
[ 86.584908] sysfs_kf_write+0x40/0x48
[ 86.588606] kernfs_fop_write+0xcc/0x1cc
[ 86.592563] __vfs_write+0x40/0x16c
[ 86.596078] vfs_write+0xa8/0x1a0
[ 86.599424] ksys_write+0x58/0xbc
[ 86.602768] __arm64_sys_write+0x18/0x20
[ 86.606733] el0_svc_common+0x94/0xf0
[ 86.610433] el0_svc_handler+0x24/0x80
[ 86.614215] el0_svc+0x8/0x7c0
[ 86.617300] Code: aa0003f3 361000e1 91062263 f9800071 (885f7c60)
[ 86.623447] ---[ end trace 4573c3c0e0761290 ]---
>+ }
>+
> return psci_to_linux_errno(err);
> }
>
>--
>2.17.1
>
Thanks,
Lina