Re: [PATCH net v3] net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to macb_open()

From: Alexander Dahl

Date: Thu Feb 19 2026 - 09:44:49 EST


Hello,

this change leads to a system lockup, see below.

Am Mon, Dec 22, 2025 at 09:56:24AM +0800 schrieb Xiaolei Wang:
> In the non-RT kernel, local_bh_disable() merely disables preemption,
> whereas it maps to an actual spin lock in the RT kernel. Consequently,
> when attempting to refill RX buffers via netdev_alloc_skb() in
> macb_mac_link_up(), a deadlock scenario arises as follows:
>
> WARNING: possible circular locking dependency detected
> 6.18.0-08691-g2061f18ad76e #39 Not tainted
> ------------------------------------------------------
> kworker/0:0/8 is trying to acquire lock:
> ffff00080369bbe0 (&bp->lock){+.+.}-{3:3}, at: macb_start_xmit+0x808/0xb7c
>
> but task is already holding lock:
> ffff000803698e58 (&queue->tx_ptr_lock){+...}-{3:3}, at: macb_start_xmit
> +0x148/0xb7c
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #3 (&queue->tx_ptr_lock){+...}-{3:3}:
> rt_spin_lock+0x50/0x1f0
> macb_start_xmit+0x148/0xb7c
> dev_hard_start_xmit+0x94/0x284
> sch_direct_xmit+0x8c/0x37c
> __dev_queue_xmit+0x708/0x1120
> neigh_resolve_output+0x148/0x28c
> ip6_finish_output2+0x2c0/0xb2c
> __ip6_finish_output+0x114/0x308
> ip6_output+0xc4/0x4a4
> mld_sendpack+0x220/0x68c
> mld_ifc_work+0x2a8/0x4f4
> process_one_work+0x20c/0x5f8
> worker_thread+0x1b0/0x35c
> kthread+0x144/0x200
> ret_from_fork+0x10/0x20
>
> -> #2 (_xmit_ETHER#2){+...}-{3:3}:
> rt_spin_lock+0x50/0x1f0
> sch_direct_xmit+0x11c/0x37c
> __dev_queue_xmit+0x708/0x1120
> neigh_resolve_output+0x148/0x28c
> ip6_finish_output2+0x2c0/0xb2c
> __ip6_finish_output+0x114/0x308
> ip6_output+0xc4/0x4a4
> mld_sendpack+0x220/0x68c
> mld_ifc_work+0x2a8/0x4f4
> process_one_work+0x20c/0x5f8
> worker_thread+0x1b0/0x35c
> kthread+0x144/0x200
> ret_from_fork+0x10/0x20
>
> -> #1 ((softirq_ctrl.lock)){+.+.}-{3:3}:
> lock_release+0x250/0x348
> __local_bh_enable_ip+0x7c/0x240
> __netdev_alloc_skb+0x1b4/0x1d8
> gem_rx_refill+0xdc/0x240
> gem_init_rings+0xb4/0x108
> macb_mac_link_up+0x9c/0x2b4
> phylink_resolve+0x170/0x614
> process_one_work+0x20c/0x5f8
> worker_thread+0x1b0/0x35c
> kthread+0x144/0x200
> ret_from_fork+0x10/0x20
>
> -> #0 (&bp->lock){+.+.}-{3:3}:
> __lock_acquire+0x15a8/0x2084
> lock_acquire+0x1cc/0x350
> rt_spin_lock+0x50/0x1f0
> macb_start_xmit+0x808/0xb7c
> dev_hard_start_xmit+0x94/0x284
> sch_direct_xmit+0x8c/0x37c
> __dev_queue_xmit+0x708/0x1120
> neigh_resolve_output+0x148/0x28c
> ip6_finish_output2+0x2c0/0xb2c
> __ip6_finish_output+0x114/0x308
> ip6_output+0xc4/0x4a4
> mld_sendpack+0x220/0x68c
> mld_ifc_work+0x2a8/0x4f4
> process_one_work+0x20c/0x5f8
> worker_thread+0x1b0/0x35c
> kthread+0x144/0x200
> ret_from_fork+0x10/0x20
>
> other info that might help us debug this:
>
> Chain exists of:
> &bp->lock --> _xmit_ETHER#2 --> &queue->tx_ptr_lock
>
> Possible unsafe locking scenario:
>
> CPU0 CPU1
> ---- ----
> lock(&queue->tx_ptr_lock);
> lock(_xmit_ETHER#2);
> lock(&queue->tx_ptr_lock);
> lock(&bp->lock);
>
> *** DEADLOCK ***
>
> Call trace:
> show_stack+0x18/0x24 (C)
> dump_stack_lvl+0xa0/0xf0
> dump_stack+0x18/0x24
> print_circular_bug+0x28c/0x370
> check_noncircular+0x198/0x1ac
> __lock_acquire+0x15a8/0x2084
> lock_acquire+0x1cc/0x350
> rt_spin_lock+0x50/0x1f0
> macb_start_xmit+0x808/0xb7c
> dev_hard_start_xmit+0x94/0x284
> sch_direct_xmit+0x8c/0x37c
> __dev_queue_xmit+0x708/0x1120
> neigh_resolve_output+0x148/0x28c
> ip6_finish_output2+0x2c0/0xb2c
> __ip6_finish_output+0x114/0x308
> ip6_output+0xc4/0x4a4
> mld_sendpack+0x220/0x68c
> mld_ifc_work+0x2a8/0x4f4
> process_one_work+0x20c/0x5f8
> worker_thread+0x1b0/0x35c
> kthread+0x144/0x200
> ret_from_fork+0x10/0x20
>
> Notably, invoking the mog_init_rings() callback upon link establishment
> is unnecessary. Instead, we can exclusively call mog_init_rings() within
> the ndo_open() callback. This adjustment resolves the deadlock issue.
> Furthermore, since MACB_CAPS_MACB_IS_EMAC cases do not use mog_init_rings()
> when opening the network interface via at91ether_open(), moving
> mog_init_rings() to macb_open() also eliminates the MACB_CAPS_MACB_IS_EMAC
> check.
>
> Fixes: 633e98a711ac ("net: macb: use resolved link config in mac_link_up()")
> Cc: stable@xxxxxxxxxxxxxxx
> Suggested-by: Kevin Hao <kexin.hao@xxxxxxxxxxxxx>
> Signed-off-by: Xiaolei Wang <xiaolei.wang@xxxxxxxxxxxxx>
> ---
>
> V1: https://patchwork.kernel.org/project/netdevbpf/patch/20251128103647.351259-1-xiaolei.wang@xxxxxxxxxxxxx/
> V2: Update the correct lock dependency chain and add the Fix tag.
> V3: update commit log, Add full deadlock log added explanations: because MACB_CAPS_MACB_IS_EMAC cases do not
> use mog_init_rings(), we don't need the MACB_CAPS_MACB_IS_EMAC check when moving mog_init_rings() to macb_open().

After upgrading from 6.12.57-rt14 to 6.12.66-rt15 on a custom at91
sam9x60 based board with PREEMPT_RT patch, we noticed a complete
system lockup, which I bisected to this changeset.

After unplugging and plugging the ethernet cable, while
running PROFINET, system does not respond to anything anymore.
Last message in kernel log is:

[ +8.621919] macb f802c000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off

Heartbeat LED does not blink anymore, no network communication,
serial console does not respond anymore.

Reverting that change locally prevents the system lockup for me, but
what is the proper course of action on kernel side now? Send a revert
to stable? Send a revert to master? Please advise.

(I'm aware there were least two more patches on netdev referencing
this change, but if I'm not mistaken none of those made it to stable,
right?)

Greets
Alex

P.S.: adding linux-rt-users to Cc

>
> drivers/net/ethernet/cadence/macb_main.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
> index ca2386b83473..064fccdcf699 100644
> --- a/drivers/net/ethernet/cadence/macb_main.c
> +++ b/drivers/net/ethernet/cadence/macb_main.c
> @@ -744,7 +744,6 @@ static void macb_mac_link_up(struct phylink_config *config,
> /* Initialize rings & buffers as clearing MACB_BIT(TE) in link down
> * cleared the pipeline and control registers.
> */
> - bp->macbgem_ops.mog_init_rings(bp);
> macb_init_buffers(bp);
>
> for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue)
> @@ -2991,6 +2990,8 @@ static int macb_open(struct net_device *dev)
> goto pm_exit;
> }
>
> + bp->macbgem_ops.mog_init_rings(bp);
> +
> for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) {
> napi_enable(&queue->napi_rx);
> napi_enable(&queue->napi_tx);
> --
> 2.43.0
>
>