Re: [PATCH] powerpc/perf: Fix IMC initialization crash

From: Michael Ellerman
Date: Wed Oct 11 2017 - 00:12:09 EST


Anju T Sudhakar <anju@xxxxxxxxxxxxxxxxxx> writes:

> Call trace observed with latest firmware, and upstream kernel.
>
> [ 14.499938] NIP [c0000000000f318c] init_imc_pmu+0x8c/0xcf0
> [ 14.499973] LR [c0000000000f33f8] init_imc_pmu+0x2f8/0xcf0
> [ 14.500007] Call Trace:
> [ 14.500027] [c000003fed18f710] [c0000000000f33c8] init_imc_pmu+0x2c8/0xcf0 (unreliable)
> [ 14.500080] [c000003fed18f800] [c0000000000b5ec0] opal_imc_counters_probe+0x300/0x400
> [ 14.500132] [c000003fed18f900] [c000000000807ef4] platform_drv_probe+0x64/0x110
> [ 14.500185] [c000003fed18f980] [c000000000804b58] driver_probe_device+0x3d8/0x580
> [ 14.500236] [c000003fed18fa10] [c000000000804e4c] __driver_attach+0x14c/0x1a0
> [ 14.500302] [c000003fed18fa90] [c00000000080156c] bus_for_each_dev+0x8c/0xf0
> [ 14.500353] [c000003fed18fae0] [c000000000803fa4] driver_attach+0x34/0x50
> [ 14.500397] [c000003fed18fb00] [c000000000803688] bus_add_driver+0x298/0x350
> [ 14.500449] [c000003fed18fb90] [c00000000080605c] driver_register+0x9c/0x180
> [ 14.500500] [c000003fed18fc00] [c000000000807dec] __platform_driver_register+0x5c/0x70
> [ 14.500552] [c000003fed18fc20] [c00000000101cee0] opal_imc_driver_init+0x2c/0x40
> [ 14.500603] [c000003fed18fc40] [c00000000000d084] do_one_initcall+0x64/0x1d0
> [ 14.500654] [c000003fed18fd00] [c00000000100434c] kernel_init_freeable+0x280/0x374
> [ 14.500705] [c000003fed18fdc0] [c00000000000d314] kernel_init+0x24/0x160
> [ 14.500750] [c000003fed18fe30] [c00000000000b4e8] ret_from_kernel_thread+0x5c/0x74
> [ 14.500799] Instruction dump:
> [ 14.500827] 4082024c 2f890002 419e054c 2e890003 41960094 2e890001 3ba0ffea 419602d8
> [ 14.500884] 419e0290 2f890003 419e02a8 e93e0118 <e8690018> 2fa30000 419e0010 4827ba41
> [ 14.500945] ---[ end trace 27b734ad26f1add4 ]---
> [ 15.908719]
> [ 16.908869] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007
> [ 16.908869]
> [ 18.125813] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000007]
>
> While registering nest imc at init, cpu-hotplug callback `nest_pmu_cpumask_init()`
> makes an opal call to stop the engine. And if the OPAL call fails,
> imc_common_cpuhp_mem_free() is invoked to cleanup memory and cpuhotplug setup.
>
> But when cleaning up the attribute group, we were dereferencing the attribute
> element array without checking whether the backing element is not NULL. This
> causes the kernel panic.
>
> Factor out the memory freeing part from imc_common_cpuhp_mem_free() to handle
> the failing case gracefully.
>
> Signed-off-by: Anju T Sudhakar <anju@xxxxxxxxxxxxxxxxxx>
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe@xxxxxxxxxxxxxxxxxx>
> ---
> arch/powerpc/perf/imc-pmu.c | 23 ++++++++++++++++-------
> 1 file changed, 16 insertions(+), 7 deletions(-)

It's the week before rc5, so I'd really like just the absolute minimal
fix. There's sufficient code movement here that I can't even immediately
see where the bug fix is.

cheers