[tip:smp/hotplug] PCI: Use cpu_hotplug_disable() instead of get_online_cpus()

From: tip-bot for Thomas Gleixner
Date: Thu Apr 20 2017 - 07:30:43 EST


Commit-ID: b4d1673371196dd9aebdd2f61d946165c777b931
Gitweb: http://git.kernel.org/tip/b4d1673371196dd9aebdd2f61d946165c777b931
Author: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
AuthorDate: Tue, 18 Apr 2017 19:04:59 +0200
Committer: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
CommitDate: Thu, 20 Apr 2017 13:08:55 +0200

PCI: Use cpu_hotplug_disable() instead of get_online_cpus()

Converting the hotplug locking, i.e. get_online_cpus(), to a percpu rwsem
unearthed a circular lock dependency which was hidden from lockdep due to
the lockdep annotation of get_online_cpus() which prevents lockdep from
creating full dependency chains. There are several variants of this. And
example is:

Chain exists of:

cpu_hotplug_lock.rw_sem --> drm_global_mutex --> &item->mutex

CPU0 CPU1
---- ----
lock(&item->mutex);
lock(drm_global_mutex);
lock(&item->mutex);
lock(cpu_hotplug_lock.rw_sem);

because there are dependencies through workqueues. The call chain is:

get_online_cpus
apply_workqueue_attrs
__alloc_workqueue_key
ttm_mem_global_init
ast_ttm_mem_global_init
drm_global_item_ref
ast_mm_init
ast_driver_load
drm_dev_register
drm_get_pci_dev
ast_pci_probe
local_pci_probe
work_for_cpu_fn
process_one_work
worker_thread

This is not a problem of get_online_cpus() recursion, it's a possible
deadlock undetected by lockdep so far.

The cure is to use cpu_hotplug_disable() instead of get_online_cpus() to
protect the PCI probing.

There is a side effect to this: cpu_hotplug_disable() makes a concurrent
cpu hotplug attempt via the sysfs interfaces fail with -EBUSY, but PCI
probing usually happens during the boot process where no interaction is
possible. Any later invocations are infrequent enough and concurrent
hotplug attempts are so unlikely that the danger of user space visible
regressions is very close to zero. Anyway, thats preferrable over a real
deadlock.

Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Acked-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Sebastian Siewior <bigeasy@xxxxxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: linux-pci@xxxxxxxxxxxxxxx
Link: http://lkml.kernel.org/r/20170418170553.806707929@xxxxxxxxxxxxx
---
drivers/pci/pci-driver.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index afa7271..f00e4d9 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -349,13 +349,13 @@ static int pci_call_probe(struct pci_driver *drv, struct pci_dev *dev,
if (node >= 0 && node != numa_node_id()) {
int cpu;

- get_online_cpus();
+ cpu_hotplug_disable();
cpu = cpumask_any_and(cpumask_of_node(node), cpu_online_mask);
if (cpu < nr_cpu_ids)
error = work_on_cpu(cpu, local_pci_probe, &ddi);
else
error = local_pci_probe(&ddi);
- put_online_cpus();
+ cpu_hotplug_enable();
} else
error = local_pci_probe(&ddi);