[PATCH] PCI: hv: Fix a race condition when removing the device

From: longli
Date: Mon Apr 19 2021 - 15:26:41 EST


From: Long Li <longli@xxxxxxxxxxxxx>

On removing the device, any work item (hv_pci_devices_present() or
hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be running
and race with hv_pci_remove().

This can happen because the host may send PCI_EJECT or PCI_BUS_RELATIONS(2)
and decide to rescind the channel immediately after that.

Fix this by flushing/stopping the workqueue of hbus before doing hbus remove.

Signed-off-by: Long Li <longli@xxxxxxxxxxxxx>
---
drivers/pci/controller/pci-hyperv.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index 27a17a1e4a7c..116815404313 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -3305,6 +3305,17 @@ static int hv_pci_remove(struct hv_device *hdev)

hbus = hv_get_drvdata(hdev);
if (hbus->state == hv_pcibus_installed) {
+ tasklet_disable(&hdev->channel->callback_event);
+ hbus->state = hv_pcibus_removing;
+ tasklet_enable(&hdev->channel->callback_event);
+
+ flush_workqueue(hbus->wq);
+ /*
+ * At this point, no work is running or can be scheduled
+ * on hbus-wq. We can't race with hv_pci_devices_present()
+ * or hv_pci_eject_device(), it's safe to proceed.
+ */
+
/* Remove the bus from PCI's point of view. */
pci_lock_rescan_remove();
pci_stop_root_bus(hbus->pci_bus);
--
2.27.0