Re: [PATCH RFC v4 12/15] arm64: psci: Ignore DENIED CPUs

From: Jonathan Cameron
Date: Thu Apr 11 2024 - 07:35:45 EST


On Wed, 31 Jan 2024 16:50:38 +0000
Russell King (Oracle) <rmk+kernel@xxxxxxxxxxxxxxx> wrote:

> From: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
>
> When a CPU is marked as disabled, but online capable in the MADT, PSCI
> applies some firmware policy to control when it can be brought online.
> PSCI returns DENIED to a CPU_ON request if this is not currently
> permitted. The OS can learn the current policy from the _STA enabled bit.
>
> Handle the PSCI DENIED return code gracefully instead of printing an
> error.
>
> See https://developer.arm.com/documentation/den0022/f/?lang=en page 58.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
> [ morse: Rewrote commit message ]
> Signed-off-by: James Morse <james.morse@xxxxxxx>
> Tested-by: Miguel Luis <miguel.luis@xxxxxxxxxx>
> Tested-by: Vishnu Pajjuri <vishnu@xxxxxxxxxxxxxxxxxxxxxx>
> Tested-by: Jianyong Wu <jianyong.wu@xxxxxxx>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> Signed-off-by: Russell King (Oracle) <rmk+kernel@xxxxxxxxxxxxxxx>
> ---

This change to return failure from __cpu_up in non error cases exposes
an possible issue with cpu_up() in kernel/cpu.c in that it brings the numa node
before we try (and fail) to bring up CPUs that may be denied.

We could try offlining the numa node on error, or just register it later.

Currently I'm testing this change which I think is harmless for cases that don't
fail the cpu_up()

For the cpu hotplug path note the node only comes online wiht the cpu online, not
the earlier hotplug. Reasonable given there is nothing online in the node before
that point.

diff --git a/kernel/cpu.c b/kernel/cpu.c
index 537099bf5d02..a4730396ccea 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1742,10 +1742,6 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
return -EINVAL;
}

- err = try_online_node(cpu_to_node(cpu));
- if (err)
- return err;
-
cpu_maps_update_begin();

if (cpu_hotplug_disabled) {
@@ -1760,7 +1756,10 @@ static int cpu_up(unsigned int cpu, enum cpuhp_state target)
err = _cpu_up(cpu, 0, target);
out:
cpu_maps_update_done();
- return err;
+ if (err)
+ return err;
+
+ return try_online_node(cpu_to_node(cpu));
}

There is a kicker in the remove path where currently check_cpu_on_node()
checks for_each_present_cpu() whereas to work for us we need to use
for_each_online_cpu() or the node is never removed.

Now my current view is that we should only show
nodes in /sys/bus/nodes/devices/ if there is a CPU online (assuming no other
reasons the node should be online such as memory).
That's easy enough to make work but all I'm really learning is that the semantics
of what is an online form a node point of view is not consistent.

Fixing this will create a minor change on x86 but does anyone really care
about what happens in the offline path wrt to 'when' the node disappears?
I think the corner case is.

1. Add 2 CPUs (A, B) in a CPU only node.
2. Online CPU A - this brings /sys/bus/devices/nodeX online
3. Remove CPU A - no effect because the check for try_remove is on presence and CPU B is
still present.
4. Online CPU B - no change.
5. Offline CPU B - no change.
4. Remove CPU B - /sys/bus/device/nodeX offline (disappears)

To make it work on arm64 where we never make CPUs not present.

1. Add 2 CPUs (A, B) in a CPU only node.
2. Online CPU A - this brings /sys/bus/devices/nodeX online
3. Remove CPU A - /sys/bus/devices/nodeX offline (disappears)
4. Online CPU B - this brings /sys/bus/device/nodeX online
5. Offline CPU B - no change (node updates only happen in hotplug code)
6. Remove CPU B - /sys/bus/device/nodeX offline (disappears).

Step 5 may seem weird but I think we can't mess with nodes there because
userspace may well rely on them still being around for some reason
(it's a much more common situation).

My assumption is that step3 removing the node isn't going to hurt anyone?

If no one shouts, i'll go ahead with rolling a v5 patch set where this is done.


Jonathan





> Changes since RFC v2
> * Add specification reference
> * Use EPERM rather than EPROBE_DEFER
> Changes since RFC v3:
> * Use EPERM everywhere
> * Drop unnecessary changes to drivers/firmware/psci/psci.c
> ---
> arch/arm64/kernel/psci.c | 2 +-
> arch/arm64/kernel/smp.c | 3 ++-
> 2 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
> index 29a8e444db83..fabd732d0a2d 100644
> --- a/arch/arm64/kernel/psci.c
> +++ b/arch/arm64/kernel/psci.c
> @@ -40,7 +40,7 @@ static int cpu_psci_cpu_boot(unsigned int cpu)
> {
> phys_addr_t pa_secondary_entry = __pa_symbol(secondary_entry);
> int err = psci_ops.cpu_on(cpu_logical_map(cpu), pa_secondary_entry);
> - if (err)
> + if (err && err != -EPERM)
> pr_err("failed to boot CPU%d (%d)\n", cpu, err);
>
> return err;
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 4ced34f62dab..dc0e0b3ec2d4 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -132,7 +132,8 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
> /* Now bring the CPU into our world */
> ret = boot_secondary(cpu, idle);
> if (ret) {
> - pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> + if (ret != -EPERM)
> + pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
> return ret;
> }
>