Re: [v3 PATCH net] net: enetc: fix the deadlock of enetc_mdio_lock
From: Jianpeng Chang
Date: Mon Oct 13 2025 - 23:07:46 EST
在 2025/10/10 19:08, Vladimir Oltean 写道:
CAUTION: This email comes from a non Wind River email account!
Do not click links or open attachments unless you recognize the sender and know the content is safe.
On Fri, Oct 10, 2025 at 01:51:38PM +0300, Vladimir Oltean wrote:
On Fri, Oct 10, 2025 at 12:31:37PM +0300, Wei Fang wrote:And I think the same concern exists for the xdp_do_redirect() calls.
I'm fine with the change in principle. It's my fault because I didn'tAfter applying the workaround for err050089, the LS1028A platformAcked-by: Wei Fang <wei.fang@xxxxxxx>
experiences RCU stalls on RT kernel. This issue is caused by the
recursive acquisition of the read lock enetc_mdio_lock. Here list some
of the call stacks identified under the enetc_poll path that may lead to
a deadlock:
enetc_poll
-> enetc_lock_mdio
-> enetc_clean_rx_ring OR napi_complete_done
-> napi_gro_receive
-> enetc_start_xmit
-> enetc_lock_mdio
-> enetc_map_tx_buffs
-> enetc_unlock_mdio
-> enetc_unlock_mdio
After enetc_poll acquires the read lock, a higher-priority writer attempts
to acquire the lock, causing preemption. The writer detects that a
read lock is already held and is scheduled out. However, readers under
enetc_poll cannot acquire the read lock again because a writer is already
waiting, leading to a thread hang.
Currently, the deadlock is avoided by adjusting enetc_lock_mdio to prevent
recursive lock acquisition.
Fixes: 6d36ecdbc441 ("net: enetc: take the MDIO lock only once per NAPI poll
cycle")
Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@xxxxxxxxxxxxx>
Hi Vladimir,
Do you have any comments? This patch will cause the regression of performance
degradation, but the RCU stalls are more severe.
understand how rwlock writer starvation prevention is implemented, I
thought there would be no problem with reentrant readers.
But I wonder if xdp_do_flush() shouldn't also be outside the enetc_lock_mdio()
section. Flushing XDP buffs with XDP_REDIRECT action might lead to
enetc_xdp_xmit() being called, which also takes the lock...
Most of the time it will be fine, but when the batch fills up it will be
auto-flushed by bq_enqueue():
if (unlikely(bq->count == DEV_MAP_BULK_SIZE))
bq_xmit_all(bq, 0);
Hi Vladimir, Wei,
If xdp_do_flush and xdp_do_redirect can potentially call enetc_xdp_xmit, we should move them outside of enetc_lock_mdio.
If there are no further comments, I will repost the patch with fixes for xdp_do_flush and xdp_do_redirect.
Thanks,
Jianpeng