Re: [mellanox/mlx5-next RFC 1/1] net/mlx5: RX, Fix refcount warning on frag page release

From: Dragos Tatulea

Date: Fri Jun 26 2026 - 09:13:55 EST




On 25.06.26 19:40, Nabil S. Alramli wrote:
> Under memory pressure, mlx5 driver has WARNING during fragmented page
> release. This happens because there is a discrepency between what mlx5
> thinks the page fragment counter is vs what the page_pool actually says it
> is.
>
The mlx5 frag counter is not the same as pp_ref_count. The page gets
split into 64 parts during page allocation. The frag counter tracks how
many of those frags have been used.

> The cause of the issue is page allocations on concurrent cpus, which
> increment the non-atomic u16 page counter mlx5e_frag_page.frags, while at
> the same time the page reference counter net_iov.pp_ref_count is atomically
> incremented. That sometimes leads to a difference in the counts and
> therefore triggers the warning in page_pool_unref_netmem:
>
page_pool page allocations must not happen in parallel on different CPUs.
Each queue has its own page_pool and allocation happens within the NAPI of
that queue which sticks to a single CPU. The release path does support
releasing on another CPU (release to ring).

How did you encounter this scenario of having parallel allocations on
different CPUs from the same page_pool?

> ```
> ret = atomic_long_sub_return(nr, pp_ref_count);
> WARN_ON(ret < 0);
> ```
>
> The actual stack trace looks like this:
>
> ```
> WARNING: CPU: 37 PID: 447795 at include/net/page_pool/helpers.h:277 mlx5e_page_release_fragmented.isra.0+0x51/0x60 [mlx5_core]
> Tainted: [S]=CPU_OUT_OF_SPEC, [O]=OOT_MODULE
> Hardware name: *
> RIP: 0010:mlx5e_page_release_fragmented.isra.0+0x51/0x60 [mlx5_core]
> RSP: 0018:ffffc90019814d98 EFLAGS: 00010293
> RAX: 000000000000003f RBX: ffff88c0993d0a10 RCX: ffffea02424592c0
> RDX: 0000000000000001 RSI: ffffea02424592c0 RDI: ffff88c090e20000
> RBP: 000000000000000a R08: 0000000000001409 R09: 0000000000000006
> R10: 0000000000000000 R11: ffff88c095fbc040 R12: 000000000000141f
> R13: 0000000000000009 R14: ffff88c090e20000 R15: 0000000000000001
> FS: 00007f34149fa6c0(0000) GS:ffff89200fa40000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007ed0265eb000 CR3: 0000005091cbe000 CR4: 0000000000350ef0
> Call Trace:
> <IRQ>
> mlx5e_free_rx_wqes+0x7b/0xa0 [mlx5_core]
> mlx5e_post_rx_wqes+0x1ac/0x5a0 [mlx5_core]
> mlx5e_napi_poll+0x5e5/0x6f0 [mlx5_core]
> __napi_poll+0x2b/0x1a0
> net_rx_action+0x30e/0x370
> ? sched_clock+0x9/0x10
> ? sched_clock_cpu+0xf/0x170
> handle_softirqs+0xe2/0x2a0
> common_interrupt+0x85/0xa0
> </IRQ>
> <TASK>
> asm_common_interrupt+0x26/0x40
> RIP: 0010:page_counter_uncharge+0x34/0x90
> RSP: 0018:ffffc900e728bb00 EFLAGS: 00000213
> RAX: ffff88aff4762000 RBX: ffff88aff4762100 RCX: 0000000000000304
> RDX: 0000000000000001 RSI: 00000000004e9e1a RDI: ffff88aff4762100
> RBP: 0000000000000001 R08: ffff891ea0560048 R09: 00007ffffffff000
> R10: 0000000000001000 R11: ffff891ae8061b00 R12: ffffffffffffffff
> R13: ffff89107fcfd4c0 R14: ffff891ae8061b00 R15: ffff892002fe1400
> uncharge_batch+0x40/0xd0
> ```
>
Can you provide more data on how you reproduced this? This helps to
narrow down the bug. Reproduction steps would be ideal.

> The fix is to use an atomic page fragment counter, so it will always match
> the number of references held in the page_pool.
>
This is not the right fix. The mlx5 page frag counter is not atomic
on purpose because all changes to it happen only within the NAPI
context.

Thanks,
Dragos