Re: [PATCH] net: enetc: fix the deadlock of enetc_mdio_lock
From: Chang, Jianpeng (CN)
Date: Wed Sep 24 2025 - 21:24:56 EST
On 9/24/2025 4:39 PM, Wei Fang wrote:
CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi Jianpeng,
This patch should target to net tree, please add the target tree to the
subject, for example, "[PATCH net]".
After applying the workaround for err050089, the LS1028A platform
experiences RCU stalls on RT kernel. This issue is caused by the
recursive acquisition of the read lock enetc_mdio_lock. Here list some
of the call stacks identified under the enetc_poll path that may lead to
a deadlock:
enetc_poll
-> enetc_lock_mdio
-> enetc_clean_rx_ring OR napi_complete_done
-> napi_gro_receive
-> enetc_start_xmit
-> enetc_lock_mdio
-> enetc_map_tx_buffs
-> enetc_unlock_mdio
-> enetc_unlock_mdio
After enetc_poll acquires the read lock, a higher-priority writer attempts
to acquire the lock, causing preemption. The writer detects that a
read lock is already held and is scheduled out. However, readers under
enetc_poll cannot acquire the read lock again because a writer is already
waiting, leading to a thread hang.
Currently, the deadlock is avoided by adjusting enetc_lock_mdio to prevent
recursive lock acquisition.
Fixes: fd5736bf9f23 ("enetc: Workaround for MDIO register access issue")
Thanks for catching this issue, we also found the similar issue on i.MX95, but
i.MX95 does not have the hardware issue, so we removed the lock for i.MX95,
see 86831a3f4cd4 ("net: enetc: remove ERR050089 workaround for i.MX95").
We also supposed that the LS1028A should have the same issue, but we did
not reproduce it on LS1028A and did not debug the issue further.
Claudiu previously suspected that this issue was introduced by 6d36ecdbc441
("net: enetc: take the MDIO lock only once per NAPI poll cycle"). Your patch
is basically the same as the one after reverting it. Therefore, the blamed
commit may be the commit 6d36ecdbc441 not the commit fd5736bf9f23,
please help check it.
Thank you for your response. I reproduced this issue while running an
iperf test on an LS1028A.
You are correct. The commit fd5736bf9f23 did not introduce the issue
directly. After 6d36ecdbc441
I find it somewhat puzzling that subsequent commits along the current
call path have also introduced
problematic changes. So I write fd5736bf9f23 here.
I will address the issue with the fix line and incorporate all the other
changes you pointed out in the v2 patch.
Thanks,
Jianpeng
Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@xxxxxxxxxxxxx>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c
b/drivers/net/ethernet/freescale/enetc/enetc.c
index e4287725832e..164d2e9ec68c 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1558,6 +1558,8 @@ static int enetc_clean_rx_ring(struct enetc_bdr
*rx_ring,
/* next descriptor to process */
i = rx_ring->next_to_clean;
+ enetc_lock_mdio();
+
while (likely(rx_frm_cnt < work_limit)) {
union enetc_rx_bd *rxbd;
struct sk_buff *skb;
@@ -1593,7 +1595,9 @@ static int enetc_clean_rx_ring(struct enetc_bdr
*rx_ring,
rx_byte_cnt += skb->len + ETH_HLEN;
rx_frm_cnt++;
+ enetc_unlock_mdio();
napi_gro_receive(napi, skb);
+ enetc_lock_mdio();
}
rx_ring->next_to_clean = i;
@@ -1601,6 +1605,7 @@ static int enetc_clean_rx_ring(struct enetc_bdr
*rx_ring,
rx_ring->stats.packets += rx_frm_cnt;
rx_ring->stats.bytes += rx_byte_cnt;
+ enetc_unlock_mdio();
Nit: please add a blank line before "return".
return rx_frm_cnt;
}
@@ -1910,6 +1915,8 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr
*rx_ring,
/* next descriptor to process */
i = rx_ring->next_to_clean;
+ enetc_lock_mdio();
+
while (likely(rx_frm_cnt < work_limit)) {
union enetc_rx_bd *rxbd, *orig_rxbd;
struct xdp_buff xdp_buff;
@@ -1973,7 +1980,9 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr
*rx_ring,
*/
enetc_bulk_flip_buff(rx_ring, orig_i, i);
+ enetc_unlock_mdio();
napi_gro_receive(napi, skb);
+ enetc_lock_mdio();
break;
case XDP_TX:
tx_ring = priv->xdp_tx_ring[rx_ring->index];
@@ -2038,6 +2047,7 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr
*rx_ring,
enetc_refill_rx_ring(rx_ring, enetc_bd_unused(rx_ring) -
rx_ring->xdp.xdp_tx_in_flight);
+ enetc_unlock_mdio();
Nit: same as above.
return rx_frm_cnt;
}