Re: [PATCH net V1 1/3] veth: enable dev_watchdog for detecting stalled TXQs

From: Jesper Dangaard Brouer
Date: Mon Oct 27 2025 - 12:24:11 EST

Next message: Chen-Yu Tsai: "Re: [PATCH 08/30] drm/sun4i: ui layer: Write attributes in one go"
Previous message: Dario Binacchi: "[PATCH v2 2/2] ARM: mxs_defconfig: enable sound drivers for imx28-amarula-rmm"
In reply to: Toke Høiland-Jørgensen: "Re: [PATCH net V1 1/3] veth: enable dev_watchdog for detecting stalled TXQs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 27/10/2025 15.09, Toke Høiland-Jørgensen wrote:

Jesper Dangaard Brouer <hawk@xxxxxxxxxx> writes:

On 24/10/2025 15.39, Toke Høiland-Jørgensen wrote:

Jesper Dangaard Brouer <hawk@xxxxxxxxxx> writes:

The changes introduced in commit dc82a33297fc ("veth: apply qdisc
backpressure on full ptr_ring to reduce TX drops") have been found to cause
a race condition in production environments.

Under specific circumstances, observed exclusively on ARM64 (aarch64)
systems with Ampere Altra Max CPUs, a transmit queue (TXQ) can become
permanently stalled. This happens when the race condition leads to the TXQ
entering the QUEUE_STATE_DRV_XOFF state without a corresponding queue wake-up,
preventing the attached qdisc from dequeueing packets and causing the
network link to halt.

As a first step towards resolving this issue, this patch introduces a
failsafe mechanism. It enables the net device watchdog by setting a timeout
value and implements the .ndo_tx_timeout callback.

If a TXQ stalls, the watchdog will trigger the veth_tx_timeout() function,
which logs a warning and calls netif_tx_wake_queue() to unstall the queue
and allow traffic to resume.

The log message will look like this:

veth42: NETDEV WATCHDOG: CPU: 34: transmit queue 0 timed out 5393 ms
veth42: veth backpressure stalled(n:1) TXQ(0) re-enable

This provides a necessary recovery mechanism while the underlying race
condition is investigated further. Subsequent patches will address the root
cause and add more robust state handling in ndo_open/ndo_stop.

Fixes: dc82a33297fc ("veth: apply qdisc backpressure on full ptr_ring to reduce TX drops")
Signed-off-by: Jesper Dangaard Brouer <hawk@xxxxxxxxxx>
---
drivers/net/veth.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index a3046142cb8e..7b1a9805b270 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -959,8 +959,10 @@ static int veth_xdp_rcv(struct veth_rq *rq, int budget,
rq->stats.vs.xdp_packets += done;
u64_stats_update_end(&rq->stats.syncp);
- if (peer_txq && unlikely(netif_tx_queue_stopped(peer_txq)))
+ if (peer_txq && unlikely(netif_tx_queue_stopped(peer_txq))) {
+ txq_trans_cond_update(peer_txq);
netif_tx_wake_queue(peer_txq);
+ }

Hmm, seems a bit weird that this call to txq_trans_cond_update() is only
in veth_xdp_recv(). Shouldn't there (also?) be one in veth_xmit()?

The veth_xmit() call (indirectly) *do* update the txq_trans start
timestamp, but only for return code NET_RX_SUCCESS / NETDEV_TX_OK.
As .ndo_start_xmit = veth_xmit and netdev_start_xmit[1] will call
txq_trans_update on NETDEV_TX_OK.

Ah, right; didn't think of checking the caller, thanks for the pointer :)

This call to txq_trans_cond_update() isn't strictly necessary, as
veth_xmit() call will update it later, and the netif_tx_stop_queue()
call also updates trans_start.

I primarily added it because other drivers that use BQL have their
helper functions update txq_trans. As I see the veth implementation as
a simplified BQL, that we hopefully can extend to become more dynamic
like BQL.

Do you prefer that I remove this? (call to txq_trans_cond_update)

Hmm, don't we need it for the XDP path? I.e., if there's no traffic
other than XDP_REDIRECT traffic, ndo_start_xmit() will not get called,
so we need some way other to keep the watchdog from firing, I think?

Yes, perhaps you are right. Even-though the stop call
netif_tx_stop_queue() also updates the txq_trans start, then with XDP
redirect something else can keep the ptr_ring full. The
netif_tx_wake_queue() call doesn't update txq_trans itself (it depend on
a successful netstack packet). So, without this txq_trans update it can
get very old (out-of-date) if we starve normal network stack packets.
I'm not 100% sure this will trigger a watchdog even, as the queue
stopped bit should have been cleared. It is might worth keeping to
avoid it gets too much out-of-date due to XDP traffic.

--Jesper

Next message: Chen-Yu Tsai: "Re: [PATCH 08/30] drm/sun4i: ui layer: Write attributes in one go"
Previous message: Dario Binacchi: "[PATCH v2 2/2] ARM: mxs_defconfig: enable sound drivers for imx28-amarula-rmm"
In reply to: Toke Høiland-Jørgensen: "Re: [PATCH net V1 1/3] veth: enable dev_watchdog for detecting stalled TXQs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]