Re: [PATCH v5 03/18] ACPI: processor: Register deferred CPUs from acpi_processor_get_info()

From: Jonathan Cameron
Date: Mon Apr 15 2024 - 06:52:23 EST


On Fri, 12 Apr 2024 20:30:40 +0200
"Rafael J. Wysocki" <rafael@xxxxxxxxxx> wrote:

> On Fri, Apr 12, 2024 at 4:38 PM Jonathan Cameron
> <Jonathan.Cameron@xxxxxxxxxx> wrote:
> >
> > From: James Morse <james.morse@xxxxxxx>
> >
> > The arm64 specific arch_register_cpu() call may defer CPU registration
> > until the ACPI interpreter is available and the _STA method can
> > be evaluated.
> >
> > If this occurs, then a second attempt is made in
> > acpi_processor_get_info(). Note that the arm64 specific call has
> > not yet been added so for now this will never be successfully
> > called.
> >
> > Systems can still be booted with 'acpi=off', or not include an
> > ACPI description at all as in these cases arch_register_cpu()
> > will not have deferred registration when first called.
> >
> > This moves the CPU register logic back to a subsys_initcall(),
> > while the memory nodes will have been registered earlier.
> > Note this is where the call was prior to the cleanup series so
> > there should be no side effects of moving it back again for this
> > specific case.
> >
> > [PATCH 00/21] Initial cleanups for vCPU HP.
> > https://lore.kernel.org/all/ZVyz%2FVe5pPu8AWoA@xxxxxxxxxxxxxxxxxxxxx/
> >
> > e.g. 5b95f94c3b9f ("x86/topology: Switch over to GENERIC_CPU_DEVICES")
> >
> > Signed-off-by: James Morse <james.morse@xxxxxxx>
> > Reviewed-by: Gavin Shan <gshan@xxxxxxxxxx>
> > Tested-by: Miguel Luis <miguel.luis@xxxxxxxxxx>
> > Tested-by: Vishnu Pajjuri <vishnu@xxxxxxxxxxxxxxxxxxxxxx>
> > Tested-by: Jianyong Wu <jianyong.wu@xxxxxxx>
> > Signed-off-by: Russell King (Oracle) <rmk+kernel@xxxxxxxxxxxxxxx>
> > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > Signed-off-by: Joanthan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > ---
> > v5: Update commit message to make it clear this is moving the
> > init back to where it was until very recently.
> >
> > No longer change the condition in the earlier registration point
> > as that will be handled by the arm64 registration routine
> > deferring until called again here.
> > ---
> > drivers/acpi/acpi_processor.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
> >
> > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > index 93e029403d05..c78398cdd060 100644
> > --- a/drivers/acpi/acpi_processor.c
> > +++ b/drivers/acpi/acpi_processor.c
> > @@ -317,6 +317,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> >
> > c = &per_cpu(cpu_devices, pr->id);
> > ACPI_COMPANION_SET(&c->dev, device);
> > + /*
> > + * Register CPUs that are present. get_cpu_device() is used to skip
> > + * duplicate CPU descriptions from firmware.
> > + */
> > + if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > + !get_cpu_device(pr->id)) {
> > + int ret = arch_register_cpu(pr->id);
> > +
> > + if (ret)
> > + return ret;
> > + }
> > +
> > /*
> > * Extra Processor objects may be enumerated on MP systems with
> > * less than the max # of CPUs. They should be ignored _iff
> > --
>
> I am still unsure why there need to be two paths calling
> arch_register_cpu() in acpi_processor_get_info().

I replied further down the thread, but the key point was to maintain
the strong distinction between 'what' was done in a real hotplug
path vs one where onlining was all. We can relax that but it goes
contrary to the careful dance that was needed to get any agreement
to the ARM architecture aspects of this.

>
> Just below the comment partially pulled into the patch context above,
> there is this code:
>
> if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> int ret = acpi_processor_hotadd_init(pr);
>
> if (ret)
> return ret;
> }
>
> For the sake of the argument, fold acpi_processor_hotadd_init() into
> it and drop the redundant _STA check from it:

If we combine these, the _STA check is necessary because we will call this
path for delayed onlining of ARM64 CPUs (if the earlier registration code
call or arch_register_cpu() returned -EPROBE defer). That's the only way
we know that a given CPU is online capable but firmware is saying we can't
bring it online yet (it may be be vHP later).

>
> if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> if (invalid_phys_cpuid(pr->phys_id))
> return -ENODEV;
>
> cpu_maps_update_begin();
> cpus_write_lock();
>
> ret = acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);

I read that call as
acpi_map_cpu_for_physical_cpu_hotplug()
but we could make it equivalent of.
acpi_map_cpu_for_whatever_cpu_hotplug()
(I'm not proposing those names though ;)

in which case it is fine to just stub it out on ARM64.
> if (ret) {
> cpus_write_unlock();
> cpu_maps_update_done();
> return ret;
> }
> ret = arch_register_cpu(pr->id);
> if (ret) {
> acpi_unmap_cpu(pr->id);
>
> cpus_write_unlock();
> cpu_maps_update_done();
> return ret;
> }
> pr_info("CPU%d has been hot-added\n", pr->id);
> pr->flags.need_hotplug_init = 1;
This one needs more careful handling because we are calling this
for non hotplug cases on arm64 in which case we end up setting this
for initially online CPUs - thus if we offline and online them
again via sysfs /sys/bus/cpu/device/cpuX/online it goes through the
hotplug path and should not.

So I need a way to detect if we are hotplugging the cpu or not.
Is there a standard way to do this? I haven't figured out how
to use flags in drivers to communicate this state.

>
> cpus_write_unlock();
> cpu_maps_update_done();
> }
>
> so I'm not sure why this cannot be combined with the new code.
>
> Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
> What's the difference then? The locking, which should be fine if I'm
> not mistaken and need_hotplug_init that needs to be set if this code
> runs after the processor driver has loaded AFAICS.

That's the bit that I'm currently finding a challenge. Is there a clean
way to detect that?

Jonathan