[PATCH 5.10 28/98] qede: fix crash in rmmod qede while automatic debug collection

From: Sasha Levin
Date: Tue Aug 24 2021 - 13:07:08 EST


From: Prabhakar Kushwaha <pkushwaha@xxxxxxxxxxx>

[ Upstream commit 1159e25c137422bdc48ee96e3fb014bd942092c6 ]

A crash has been observed if rmmod is done while automatic debug
collection in progress. It is due to a race condition between
both of them.

To fix stop the sp_task during unload to avoid running qede_sp_task
even if they are schedule during removal process.

Signed-off-by: Alok Prasad <palok@xxxxxxxxxxx>
Signed-off-by: Shai Malin <smalin@xxxxxxxxxxx>
Signed-off-by: Ariel Elior <aelior@xxxxxxxxxxx>
Signed-off-by: Prabhakar Kushwaha <pkushwaha@xxxxxxxxxxx>
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>
---
drivers/net/ethernet/qlogic/qede/qede.h | 1 +
drivers/net/ethernet/qlogic/qede/qede_main.c | 8 ++++++++
2 files changed, 9 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index 3efc5899f656..f313fd730331 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -494,6 +494,7 @@ struct qede_fastpath {
#define QEDE_SP_HW_ERR 4
#define QEDE_SP_ARFS_CONFIG 5
#define QEDE_SP_AER 7
+#define QEDE_SP_DISABLE 8

#ifdef CONFIG_RFS_ACCEL
int qede_rx_flow_steer(struct net_device *dev, const struct sk_buff *skb,
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 05e3a3b60269..d9a3c811ac8b 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1006,6 +1006,13 @@ static void qede_sp_task(struct work_struct *work)
struct qede_dev *edev = container_of(work, struct qede_dev,
sp_task.work);

+ /* Disable execution of this deferred work once
+ * qede removal is in progress, this stop any future
+ * scheduling of sp_task.
+ */
+ if (test_bit(QEDE_SP_DISABLE, &edev->sp_flags))
+ return;
+
/* The locking scheme depends on the specific flag:
* In case of QEDE_SP_RECOVERY, acquiring the RTNL lock is required to
* ensure that ongoing flows are ended and new ones are not started.
@@ -1297,6 +1304,7 @@ static void __qede_remove(struct pci_dev *pdev, enum qede_remove_mode mode)
qede_rdma_dev_remove(edev, (mode == QEDE_REMOVE_RECOVERY));

if (mode != QEDE_REMOVE_RECOVERY) {
+ set_bit(QEDE_SP_DISABLE, &edev->sp_flags);
unregister_netdev(ndev);

cancel_delayed_work_sync(&edev->sp_task);
--
2.30.2