Re: [PATCH net v4 0/2] stmmac crash/stall fixes when under memory pressure
From: Maxime Chevallier
Date: Fri Apr 03 2026 - 02:15:36 EST
Hi Russell
On 02/04/2026 19:16, Russell King (Oracle) wrote:
> On Tue, Mar 31, 2026 at 09:19:27PM -0700, Sam Edwards wrote:
>> Hi netdev,
>>
>> This is v4 of my series containing a pair of bugfixes for the stmmac driver's
>> receive pipeline. These issues occur when stmmac_rx_refill() does not (fully)
>> succeed, which happens more frequently when free memory is low.
>>
>> The first patch closes Bugzilla bug #221010 [1], where stmmac_rx() can circle
>> around to a still-dirty descriptor (with a NULL buffer pointer), mistake it for
>> a filled descriptor (due to OWN=0), and attempt to dereference the buffer.
>>
>> In testing that patch, I discovered a second issue: starvation of available RX
>> buffers causes the NIC to stop sending interrupts; if the driver stops polling,
>> it will wait indefinitely for an interrupt that will never come. (Note: the
>> first patch makes this issue more prominent -- mostly because it lets the
>> system survive long enough to exhibit it -- but doesn't *cause* it.) The second
>> patch addresses that problem as well.
>>
>> Both patches are minimal, appropriate for stable, and designated to `net`. My
>> focus is on small, obviously-correct, easy-to-explain changes: I'll follow up
>> with another patch/series (something like [2]) for `net-next` that fixes the
>> ring in a more robust way.
>>
>> The tx and zc paths seem to have similar low-memory bugs, to be addressed in
>> separate series.
>
> I've tested this on my Jetson Xavier platform. One of the issues I've
> had is that running iperf3 results in the receive side stalling because
> it runs out of descriptors. However, despite the receive ring
> eventually being re-filled and the hardware appropriately prodded, it
> steadfastly refuses to restart, despite the descriptors having been
> updated.
>
> What I can see is there's 40 packets in the internal FIFOs via the
> PRXQ[13:0] field of the ETH_MTLRXQxDR register.
>
> With your patches applied:
>
> root@tegra-ubuntu:~# iperf3 -c 192.168.248.1 -R
> Connecting to host 192.168.248.1, port 5201
> Reverse mode, remote host 192.168.248.1 is sending
> [ 5] local 192.168.248.174 port 43728 connected to 192.168.248.1 port 5201
> [ ID] Interval Transfer Bitrate
> [ 5] 0.00-1.00 sec 30.3 MBytes 254 Mbits/sec
> [ 5] 1.00-2.00 sec 0.00 Bytes 0.00 bits/sec
> [ 5] 2.00-3.00 sec 0.00 Bytes 0.00 bits/sec
> [ 5] 3.00-4.00 sec 0.00 Bytes 0.00 bits/sec
> [ 5] 4.00-5.00 sec 0.00 Bytes 0.00 bits/sec
> [ 5] 5.00-6.00 sec 0.00 Bytes 0.00 bits/sec
> ...
Ah !!
I have been struggling with that problem this week too. I stumbled upon
it while trying to test your TSO series, and at firts I thought it was
because of the TSO patches, but turns out it's not, I reproduce it on
net-next.
The main problem for me is that it's not always reproducible, it may or
may not show up when I run iperf3 after a fresh restart.
This is on socfpga (dwmac1000), so it seems the problem exists across IP
versions.
I've been on and off trying to make progress on that during the week,
but without success so far...
Maxime