Re: [PATCH] platform/x86/intel-uncore-freq: fix inconsistent state on init failure
From: srinivas pandruvada
Date: Mon Apr 14 2025 - 12:07:09 EST
On Mon, 2025-04-14 at 13:41 +0300, Ilpo Järvinen wrote:
> On Mon, 14 Apr 2025, shouyeliu wrote:
>
> > When uncore_event_cpu_online() fails to initialize a control CPU
> > (e.g.,
> > due to memory allocation failure or uncore_freq_add_entry()
> > errors),
> > the code leaves stale entries in uncore_cpu_mask after that online
> > CPU
> > will not try to call uncore_freq_add_entry, resulting in no sys
> > interface.
>
> Please add () after any name that refers to a C function (you're not
> even
> being consistent here as you had it in some cases but not here).
>
> Please try to split the very long sentence a bit and make it more
> obvious
> what causes what as the current wording is a bit vague, did you mean:
> uncore_event_cpu_online() will not call uncore_freq_add_entry() for
> another CPU that is being onlined or something along those lines?
>
> Will this change work/matter? Documentation/core-api/cpu_hotplug.rst
> says
> about cpuhp_setup_state():
>
> "If a callback fails for CPU N then the teardown callback for CPU
> 0 .. N-1 is invoked to rollback the operation. The state setup
> fails,
> the callbacks for the state are not installed and in case of dynamic
> allocation the allocated state is freed."
>
Yes, cpuhp_setup_state() will fail and which will result in clean up.
So any fail of any fail uncore_event_cpu_online() will result in no sys
entries.
I think here the intention is to keep sys entries, which will not
happen with this patch.
For confirmation on 6.14 kernel, I forced failure on CPU 10:
[595799.696873] intel_uncore_init
[595799.700102] uncore_event_cpu_online cpu:0
[595799.704240] uncore_event_cpu_online cpu:1
[595799.708360] uncore_event_cpu_online cpu:2
[595799.712505] uncore_event_cpu_online cpu:3
[595799.716633] uncore_event_cpu_online cpu:4
[595799.720755] uncore_event_cpu_online cpu:5
[595799.724953] uncore_event_cpu_online cpu:6
[595799.729158] uncore_event_cpu_online cpu:7
[595799.733409] uncore_event_cpu_online cpu:8
[595799.737674] uncore_event_cpu_online cpu:9
[595799.741954] uncore_event_cpu_online cpu:10
[595799.746134] Force CPU 10 to fail online
[595799.750182] uncore_event_cpu_offline cpu:0
[595799.754508] uncore_event_cpu_offline cpu:1
[595799.758834] uncore_event_cpu_offline cpu:2
[595799.763238] uncore_event_cpu_offline cpu:3
[595799.767558] uncore_event_cpu_offline cpu:4
[595799.771832] uncore_event_cpu_offline cpu:5
[595799.776178] uncore_event_cpu_offline cpu:6
[595799.780506] uncore_event_cpu_offline cpu:7
[595799.784862] uncore_event_cpu_offline cpu:8
[595799.789247] uncore_event_cpu_offline cpu:9
[595799.793540] intel_uncore_init cpuhp_setup_state failed
[595799.798776] intel_uncore_init failed
Thanks,
Srinivas
> >
>
> Fixes tag?
>
> > Signed-off-by: shouyeliu <shouyeliu@xxxxxxxxx>
>
> The correct format for tags is documented in
> Documentation/process/5.Posting.rst:
>
> tag: Full Name <email address>
>
> > ---
> > .../x86/intel/uncore-frequency/uncore-frequency.c | 12
> > ++++++++----
> > 1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/platform/x86/intel/uncore-frequency/uncore-
> > frequency.c b/drivers/platform/x86/intel/uncore-frequency/uncore-
> > frequency.c
> > index 40bbf8e45fa4..1de0a4a9d6cd 100644
> > --- a/drivers/platform/x86/intel/uncore-frequency/uncore-
> > frequency.c
> > +++ b/drivers/platform/x86/intel/uncore-frequency/uncore-
> > frequency.c
> > @@ -146,15 +146,13 @@ static int uncore_event_cpu_online(unsigned
> > int cpu)
> > {
> > struct uncore_data *data;
> > int target;
> > + int ret;
> >
> > /* Check if there is an online cpu in the package for
> > uncore MSR */
> > target = cpumask_any_and(&uncore_cpu_mask,
> > topology_die_cpumask(cpu));
> > if (target < nr_cpu_ids)
> > return 0;
> >
> > - /* Use this CPU on this die as a control CPU */
> > - cpumask_set_cpu(cpu, &uncore_cpu_mask);
> > -
> > data = uncore_get_instance(cpu);
> > if (!data)
> > return 0;
> > @@ -163,7 +161,13 @@ static int uncore_event_cpu_online(unsigned
> > int cpu)
> > data->die_id = topology_die_id(cpu);
> > data->domain_id = UNCORE_DOMAIN_ID_INVALID;
> >
> > - return uncore_freq_add_entry(data, cpu);
> > + ret = uncore_freq_add_entry(data, cpu);
> > + if (!ret) {
> > + /* Use this CPU on this die as a control CPU */
> > + cpumask_set_cpu(cpu, &uncore_cpu_mask);
> > + }
> > +
> > + return ret;
>
> Please reverse to logic such that you return early on error, which is
> the
> usual error handling pattern.
>
> > }
> >
> > static int uncore_event_cpu_offline(unsigned int cpu)
> >
>