Anju T Sudhakar <anju@xxxxxxxxxxxxxxxxxx> writes:
Call trace observed during boot:What's the actual oops?
[c000000ff38ffb80] c0000000002ddfac perf_pmu_migrate_context+0xac/0x470Why do we need a bool to track this? Can't we just check the data
[c000000ff38ffc40] c00000000011385c ppc_core_imc_cpu_offline+0x1ac/0x1e0
[c000000ff38ffc90] c000000000125758 cpuhp_invoke_callback+0x198/0x5d0
[c000000ff38ffd00] c00000000012782c cpuhp_thread_fun+0x8c/0x3d0
[c000000ff38ffd60] c0000000001678d0 smpboot_thread_fn+0x290/0x2a0
[c000000ff38ffdc0] c00000000015ee78 kthread+0x168/0x1b0
[c000000ff38ffe30] c00000000000b368 ret_from_kernel_thread+0x5c/0x74
While registering the cpuhoplug callbacks for core-imc, if we fails
in the cpuhotplug online path for any random core (either because opal call to
initialize the core-imc counters fails or because memory allocation fails for
that core), ppc_core_imc_cpu_offline() will get invoked for other cpus who
successfully returned from cpuhotplug online path.
But in the ppc_core_imc_cpu_offline() path we are trying to migrate the event
context, when core-imc counters are not even initialized. Thus creating the
above stack dump.
Add a check to see if core-imc counters are enabled or not in the cpuhotplug
offline path before migrating the context to handle this failing scenario.
structure we're deinitialising has been initialised?
Doesn't this also mean we won't cleanup the initialisation for any CPUs
that have been initialised?
cheers
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 8812624..08139f9 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -30,6 +30,7 @@ static struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
static cpumask_t nest_imc_cpumask;
struct imc_pmu_ref *nest_imc_refc;
static int nest_pmus;
+static bool core_imc_enabled;
/* Core IMC data structures and variables */
@@ -607,6 +608,19 @@ static int ppc_core_imc_cpu_offline(unsigned int cpu)
if (!cpumask_test_and_clear_cpu(cpu, &core_imc_cpumask))
return 0;
+ /*
+ * See if core imc counters are enabled or not.
+ *
+ * Suppose we reach here from core_imc_cpumask_init(),
+ * since we failed at the cpuhotplug online path for any random
+ * core (either because opal call to initialize the core-imc counters
+ * failed or because memory allocation failed).
+ * We need to check whether core imc counters are enabled or not before
+ * migrating the event context from cpus in the other cores.
+ */
+ if (!core_imc_enabled)
+ return 0;
+
/* Find any online cpu in that core except the current "cpu" */
ncpu = cpumask_any_but(cpu_sibling_mask(cpu), cpu);
@@ -1299,6 +1313,7 @@ int init_imc_pmu(struct device_node *parent, struct imc_pmu *pmu_ptr, int pmu_id
return ret;
}
+ core_imc_enabled = true;
break;
case IMC_DOMAIN_THREAD:
ret = thread_imc_cpu_init();
--
2.7.4