Re: [PATCH 1/5] perf/arm-cci: Fix CPU hotplug race avoidance

From: Corentin Labbe
Date: Tue Feb 05 2019 - 05:10:53 EST


On Mon, Feb 04, 2019 at 05:09:04PM +0000, Robin Murphy wrote:
> The arm-cci probe logic faces a cyclic dependency wherein it has to pick
> a valid CPU to associate with before registering the PMU device, has to
> have the PMU state initialised before handling hotplug events in case it
> must be migrated, but has to have the hotplug notifier registered before
> the chosen CPU may go offline lest things get out of sync. The present
> code has tried to solve the races by using get_cpu() to pick the current
> CPU and prevent it from disappearing while the other two registrations
> are performed, but that results in taking mutexes with preemption
> disabled, which makes certain configurations very unhappy:
>
> [ 1.983337] BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:2004
> [ 1.983340] in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: swapper/0
> [ 1.983342] Preemption disabled at:
> [ 1.983353] [<ffffff80089801f4>] cci_pmu_probe+0x1dc/0x488
> [ 1.983360] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.18.20-rt8-yocto-preempt-rt #1
> [ 1.983362] Hardware name: ZynqMP ZCU102 Rev1.0 (DT)
> [ 1.983364] Call trace:
> [ 1.983369] dump_backtrace+0x0/0x158
> [ 1.983372] show_stack+0x24/0x30
> [ 1.983378] dump_stack+0x80/0xa4
> [ 1.983383] ___might_sleep+0x138/0x160
> [ 1.983386] __might_sleep+0x58/0x90
> [ 1.983391] __rt_mutex_lock_state+0x30/0xc0
> [ 1.983395] _mutex_lock+0x24/0x30
> [ 1.983400] perf_pmu_register+0x2c/0x388
> [ 1.983404] cci_pmu_probe+0x2bc/0x488
> [ 1.983409] platform_drv_probe+0x58/0xa8
>
> However, we don't actually mind being preempted or migrated at this
> point; all that really matters is that whichever CPU we pick does not
> get offlined before we're done. Thus, do the robust thing and instead
> take the lock to inhibit CPU hotplug for the duration. This also
> revealed an additional race in assigning the global pointer too late
> relative to the hotplug notifier, so that gets fixed in the process.
>
> Reported-by: "Li, Meng" <Meng.Li@xxxxxxxxxxxxx>
> Signed-off-by: Robin Murphy <robin.murphy@xxxxxxx>
> ---
> drivers/perf/arm-cci.c | 22 ++++++++++++----------
> 1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/perf/arm-cci.c b/drivers/perf/arm-cci.c
> index 1bfeb160c5b1..f6d9df07ec9b 100644
> --- a/drivers/perf/arm-cci.c
> +++ b/drivers/perf/arm-cci.c
> @@ -1692,21 +1692,23 @@ static int cci_pmu_probe(struct platform_device *pdev)
> raw_spin_lock_init(&cci_pmu->hw_events.pmu_lock);
> mutex_init(&cci_pmu->reserve_mutex);
> atomic_set(&cci_pmu->active_events, 0);
> - cci_pmu->cpu = get_cpu();
> +
> + cpus_read_lock();
> + cci_pmu->cpu = smp_processor_id();
>
> ret = cci_pmu_init(cci_pmu, pdev);
> - if (ret) {
> - put_cpu();
> - return ret;
> - }
> + if (ret)
> + goto out;
>
> - cpuhp_setup_state_nocalls(CPUHP_AP_PERF_ARM_CCI_ONLINE,
> - "perf/arm/cci:online", NULL,
> - cci_pmu_offline_cpu);
> - put_cpu();
> g_cci_pmu = cci_pmu;
> + cpuhp_setup_state_nocalls_cpuslocked(CPUHP_AP_PERF_ARM_CCI_ONLINE,
> + "perf/arm/cci:online", NULL,
> + cci_pmu_offline_cpu);
> +
> pr_info("ARM %s PMU driver probed", cci_pmu->model->name);
> - return 0;
> +out:
> + cpus_read_unlock();
> + return ret;
> }
>
> static int cci_pmu_remove(struct platform_device *pdev)
> --
> 2.20.1.dirty

Hello

Thanks, this patch fix my issue that I has reported here:
https://lkml.org/lkml/2017/12/29/139
https://lkml.org/lkml/2018/11/12/1901

Tested-by: Corentin Labbe <clabbe.montjoie@xxxxxxxxx>
Tested-on: sun8i-a83t-bananapi-m3

Regards