Re: [PATCH v5 03/18] ACPI: processor: Register deferred CPUs from acpi_processor_get_info()

From: Rafael J. Wysocki
Date: Mon Apr 15 2024 - 07:53:01 EST


On Mon, Apr 15, 2024 at 12:52 PM Jonathan Cameron
<Jonathan.Cameron@xxxxxxxxxx> wrote:
>
> On Fri, 12 Apr 2024 20:30:40 +0200
> "Rafael J. Wysocki" <rafael@xxxxxxxxxx> wrote:
>
> > On Fri, Apr 12, 2024 at 4:38 PM Jonathan Cameron
> > <Jonathan.Cameron@xxxxxxxxxx> wrote:
> > >
> > > From: James Morse <james.morse@xxxxxxx>
> > >
> > > The arm64 specific arch_register_cpu() call may defer CPU registration
> > > until the ACPI interpreter is available and the _STA method can
> > > be evaluated.
> > >
> > > If this occurs, then a second attempt is made in
> > > acpi_processor_get_info(). Note that the arm64 specific call has
> > > not yet been added so for now this will never be successfully
> > > called.
> > >
> > > Systems can still be booted with 'acpi=off', or not include an
> > > ACPI description at all as in these cases arch_register_cpu()
> > > will not have deferred registration when first called.
> > >
> > > This moves the CPU register logic back to a subsys_initcall(),
> > > while the memory nodes will have been registered earlier.
> > > Note this is where the call was prior to the cleanup series so
> > > there should be no side effects of moving it back again for this
> > > specific case.
> > >
> > > [PATCH 00/21] Initial cleanups for vCPU HP.
> > > https://lore.kernel.org/all/ZVyz%2FVe5pPu8AWoA@xxxxxxxxxxxxxxxxxxxxx/
> > >
> > > e.g. 5b95f94c3b9f ("x86/topology: Switch over to GENERIC_CPU_DEVICES")
> > >
> > > Signed-off-by: James Morse <james.morse@xxxxxxx>
> > > Reviewed-by: Gavin Shan <gshan@xxxxxxxxxx>
> > > Tested-by: Miguel Luis <miguel.luis@xxxxxxxxxx>
> > > Tested-by: Vishnu Pajjuri <vishnu@xxxxxxxxxxxxxxxxxxxxxx>
> > > Tested-by: Jianyong Wu <jianyong.wu@xxxxxxx>
> > > Signed-off-by: Russell King (Oracle) <rmk+kernel@xxxxxxxxxxxxxxx>
> > > Co-developed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > > Signed-off-by: Joanthan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> > > ---
> > > v5: Update commit message to make it clear this is moving the
> > > init back to where it was until very recently.
> > >
> > > No longer change the condition in the earlier registration point
> > > as that will be handled by the arm64 registration routine
> > > deferring until called again here.
> > > ---
> > > drivers/acpi/acpi_processor.c | 12 ++++++++++++
> > > 1 file changed, 12 insertions(+)
> > >
> > > diff --git a/drivers/acpi/acpi_processor.c b/drivers/acpi/acpi_processor.c
> > > index 93e029403d05..c78398cdd060 100644
> > > --- a/drivers/acpi/acpi_processor.c
> > > +++ b/drivers/acpi/acpi_processor.c
> > > @@ -317,6 +317,18 @@ static int acpi_processor_get_info(struct acpi_device *device)
> > >
> > > c = &per_cpu(cpu_devices, pr->id);
> > > ACPI_COMPANION_SET(&c->dev, device);
> > > + /*
> > > + * Register CPUs that are present. get_cpu_device() is used to skip
> > > + * duplicate CPU descriptions from firmware.
> > > + */
> > > + if (!invalid_logical_cpuid(pr->id) && cpu_present(pr->id) &&
> > > + !get_cpu_device(pr->id)) {
> > > + int ret = arch_register_cpu(pr->id);
> > > +
> > > + if (ret)
> > > + return ret;
> > > + }
> > > +
> > > /*
> > > * Extra Processor objects may be enumerated on MP systems with
> > > * less than the max # of CPUs. They should be ignored _iff
> > > --
> >
> > I am still unsure why there need to be two paths calling
> > arch_register_cpu() in acpi_processor_get_info().
>
> I replied further down the thread, but the key point was to maintain
> the strong distinction between 'what' was done in a real hotplug
> path vs one where onlining was all. We can relax that but it goes
> contrary to the careful dance that was needed to get any agreement
> to the ARM architecture aspects of this.

This seems to go beyond technical territory.

As a general rule, we tend to combine code paths that look similar
instead of making them separate on purpose. Especially with a little
to no explanation of the technical reason.

> >
> > Just below the comment partially pulled into the patch context above,
> > there is this code:
> >
> > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> > int ret = acpi_processor_hotadd_init(pr);
> >
> > if (ret)
> > return ret;
> > }
> >
> > For the sake of the argument, fold acpi_processor_hotadd_init() into
> > it and drop the redundant _STA check from it:
>
> If we combine these, the _STA check is necessary because we will call this
> path for delayed onlining of ARM64 CPUs (if the earlier registration code
> call or arch_register_cpu() returned -EPROBE defer). That's the only way
> we know that a given CPU is online capable but firmware is saying we can't
> bring it online yet (it may be be vHP later).

Ignoring the above as per the other message.

> >
> > if (invalid_logical_cpuid(pr->id) || !cpu_present(pr->id)) {
> > if (invalid_phys_cpuid(pr->phys_id))
> > return -ENODEV;
> >
> > cpu_maps_update_begin();
> > cpus_write_lock();
> >
> > ret = acpi_map_cpu(pr->handle, pr->phys_id, pr->acpi_id, &pr->id);
>
> I read that call as
> acpi_map_cpu_for_physical_cpu_hotplug()
> but we could make it equivalent of.
> acpi_map_cpu_for_whatever_cpu_hotplug()

Yes.

> (I'm not proposing those names though ;)

Sure.

> in which case it is fine to just stub it out on ARM64.
> > if (ret) {
> > cpus_write_unlock();
> > cpu_maps_update_done();
> > return ret;
> > }
> > ret = arch_register_cpu(pr->id);
> > if (ret) {
> > acpi_unmap_cpu(pr->id);
> >
> > cpus_write_unlock();
> > cpu_maps_update_done();
> > return ret;
> > }
> > pr_info("CPU%d has been hot-added\n", pr->id);
> > pr->flags.need_hotplug_init = 1;
> This one needs more careful handling because we are calling this
> for non hotplug cases on arm64 in which case we end up setting this
> for initially online CPUs - thus if we offline and online them
> again via sysfs /sys/bus/cpu/device/cpuX/online it goes through the
> hotplug path and should not.
>
> So I need a way to detect if we are hotplugging the cpu or not.
> Is there a standard way to do this?

If you mean physical hot-add, I don't think so.

What exactly do you need to do differently in the two cases?

> I haven't figured out how to use flags in drivers to communicate this state.
>
> >
> > cpus_write_unlock();
> > cpu_maps_update_done();
> > }
> >
> > so I'm not sure why this cannot be combined with the new code.
> >
> > Say acpi_map_cpu) / acpi_unmap_cpu() are turned into arch calls.
> > What's the difference then? The locking, which should be fine if I'm
> > not mistaken and need_hotplug_init that needs to be set if this code
> > runs after the processor driver has loaded AFAICS.
>
> That's the bit that I'm currently finding a challenge. Is there a clean
> way to detect that?

If you mean the need_hotplug_init flag, I'm currently a bit struggling
to convince myself that it is really needed.