Re: [PATCH net] net/core: add xmit recursion limit to qdisc transmit path
From: Weiming Shi
Date: Tue Mar 03 2026 - 04:48:52 EST
On 26-03-03 05:30, Eric Dumazet wrote:
> On Tue, Mar 3, 2026 at 3:37 AM <bestswngs@xxxxxxxxx> wrote:
> >
> > From: Weiming Shi <bestswngs@xxxxxxxxx>
> >
> > __dev_queue_xmit() has two transmit code paths depending on whether the
> > device has a qdisc attached:
> >
> > 1. Qdisc path (q->enqueue): calls __dev_xmit_skb()
> > 2. No-qdisc path: calls dev_hard_start_xmit() directly
> >
> > Commit 745e20f1b626 ("net: add a recursion limit in xmit path") added
> > recursion protection to the no-qdisc path via dev_xmit_recursion()
> > check and dev_xmit_recursion_inc()/dec() tracking. However, the qdisc
> > path performs no recursion depth checking at all.
> >
> > This allows unbounded recursion through qdisc-attached devices. For
> > example, a bond interface in broadcast mode with gretap slaves whose
> > remote endpoints route back through the bond creates an infinite
> > transmit loop that exhausts the kernel stack:
>
> Non lltx drivers would deadlock in HARD_TX_LOCK().
>
> I would prefer we try to fix this issue at configuration time instead
> of adding yet another expensive operations in the fast path.
>
> Can you provide a test ?
>
> Thanks.
Thanks for the review. I have two follow-up questions:
1. For the configuration-time approach: the loop in this case is
formed through the routing layer (gretap remote endpoint routes
back through the bond), not through direct upper/lower device
links. Since routes can change dynamically after enslave, would
this require adding checks in all of bond_enslave(), route change,
and address change paths to be complete? I want to make sure I
understand the scope before going down that path.
2. As an alternative, would it be acceptable to move the recursion
check into the bonding driver itself (e.g., bond_start_xmit() or
bond_xmit_broadcast())? This would avoid touching the generic fast
path entirely, and since bond is LLTX, there is no HARD_TX_LOCK()
deadlock concern. It would narrowly target the driver that causes
the fan-out recursion.
Happy to respin in either direction, or explore other approaches
you have in mind.