Re: [mellanox/mlx5-next RFC 1/1] net/mlx5: RX, Fix refcount warning on frag page release

From: Dragos Tatulea

Date: Sat Jun 27 2026 - 03:48:58 EST

On 26.06.26 20:02, Nabil S. Alramli wrote:
> On 6/26/26 09:12, Dragos Tatulea wrote:
>>
>>
>> [...]
>>> ```
>>> ret = atomic_long_sub_return(nr, pp_ref_count);
>>> WARN_ON(ret < 0);
>>> ```
>>>
>>> The actual stack trace looks like this:
>>>
>>> ```
>>> WARNING: CPU: 37 PID: 447795 at include/net/page_pool/helpers.h:277 mlx5e_page_release_fragmented.isra.0+0x51/0x60 [mlx5_core]
>>> Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
>>> Hardware name: *
>>> RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x51/0x60 [mlx5_core]
>>> RSP: 0018:ffffc90019814d98 EFLAGS: 00010293
>>> RAX: 000000000000003f RBX: ffff88c0993d0a10 RCX: ffffea02424592c0
>>> RDX: 0000000000000001 RSI: ffffea02424592c0 RDI: ffff88c090e20000
>>> RBP: 000000000000000a R08: 0000000000001409 R09: 0000000000000006
>>> R10: 0000000000000000 R11: ffff88c095fbc040 R12: 000000000000141f
>>> R13: 0000000000000009 R14: ffff88c090e20000 R15: 0000000000000001
>>> FS: 00007f34149fa6c0(0000) GS:ffff89200fa40000(0000) knlGS:0000000000000000
>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> CR2: 00007ed0265eb000 CR3: 0000005091cbe000 CR4: 0000000000350ef0
>>> Call Trace:
>>> <IRQ>
>>> mlx5e_free_rx_wqes+0x7b/0xa0 [mlx5_core]
>>> mlx5e_post_rx_wqes+0x1ac/0x5a0 [mlx5_core]
>>> mlx5e_napi_poll+0x5e5/0x6f0 [mlx5_core]
>>> __napi_poll+0x2b/0x1a0
>>> net_rx_action+0x30e/0x370
>>> ? sched_clock+0x9/0x10
>>> ? sched_clock_cpu+0xf/0x170
>>> handle_softirqs+0xe2/0x2a0
>>> common_interrupt+0x85/0xa0
>>> </IRQ>
>>> <TASK>
>>> asm_common_interrupt+0x26/0x40
>>> RIP: 0010:page_counter_uncharge+0x34/0x90
>>> RSP: 0018:ffffc900e728bb00 EFLAGS: 00000213
>>> RAX: ffff88aff4762000 RBX: ffff88aff4762100 RCX: 0000000000000304
>>> RDX: 0000000000000001 RSI: 00000000004e9e1a RDI: ffff88aff4762100
>>> RBP: 0000000000000001 R08: ffff891ea0560048 R09: 00007ffffffff000
>>> R10: 0000000000001000 R11: ffff891ae8061b00 R12: ffffffffffffffff
>>> R13: ffff89107fcfd4c0 R14: ffff891ae8061b00 R15: ffff892002fe1400
>>> uncharge_batch+0x40/0xd0
>>> ```
>>>
>> Can you provide more data on how you reproduced this? This helps to
>> narrow down the bug. Reproduction steps would be ideal.
>>
>
> I don't have clear steps to reproduce it, we just have seen it randomly on
> some servers that were under memory pressure. I will try to look into it more
> and find a way to reliably reproduce it. I agree that would be ideal to find a
> proper fix.
>
What NIC is this?
What MTU is being used?
Is strided rq enabled (ethtool --show-priv-flags).
Is XDP/AF_XDP used? If yes, can you provide more details?
Is HW-GRO on?

Based on those answers we can review the code path and see if there
is a case where the accounting for the fragments is not done correctly

Also, is buf_alloc_err growing during these memory pressure?

>>> The fix is to use an atomic page fragment counter, so it will always match
>>> the number of references held in the page_pool.
>>>
>> This is not the right fix. The mlx5 page frag counter is not atomic
>> on purpose because all changes to it happen only within the NAPI
>> context.
>>
>
> That was a question that I had, is it ever possible for frag_page->frags to be
> incremented / set outside of NAPI context? I tried to answer that by looking
> at code and by tracing it but could not get a clear picture. If it's not
> possible then I agree, this is not the right fix.
>
If that happens it is probably a bug.

Thanks,
Dragos