Re: [RFC PATCH net-next 0/3] net: macb: candidate fixes for silent TX stall on BCM2712/RP1

From: Lukasz Raczylo

Date: Sat Apr 25 2026 - 17:48:46 EST


A follow-up runtime data point on this series.

Fleet state at 2026-04-25 21:46 UTC:

* Patched uptime (since staggered rollout 2026-04-24 18:10-19:20 UTC):
- shortest: 26h 26m (last master upgraded)
- longest: 27h 34m (canary)
- cumulative across 24 nodes: ~651 node-hours

* Macb-attributable event counts (out-of-band userspace watchdog;
the [tx-stall] detector watches /sys/class/net/end0/statistics/
tx_packets + qdisc backlog every 1 s and would have fired
ip link down/up if any node's TX path froze):
- RECOVER trigger=tx-stall (actual stalls caught): 0
- partial [tx-stall] markers (transient 1 s freezes): 0

* Separately: 40 RECOVER events with trigger=ping fired in this
window across the fleet, attributable to a brief upstream-network
outage (gateway / switch event); each node simultaneously lost ping
to gateway, VIP, and NAS within seconds of each other, then
recovered. These are unrelated to the macb hang the patch series
targets — distinguishing them from a real TX stall is exactly what
the trigger= tag in the watchdog log is for.

At the pre-patch rate referenced in the cover letter (50 stalls in
95 node-hours observed in our 2026-04-24 14:00-18:10 UTC reference
window, ~0.5 per node-hour), the projected stall count in
651 node-hours is on the order of 342;
observed is 0.

Same observability runs forward; will reply again after a full week
of uptime unless something changes.

--
Lukasz Raczylo