Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools

Next message: Dinh Nguyen: "Re: [PATCH 1/2] dt-bindings: intel: Add Agilex3 SoCFPGA board"
Previous message: Florian Fuchs: "[PATCH net] net: ps3_gelic_net: handle skb allocation failures"
Next in thread: Dragos Tatulea: "Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Pavel Begunkov

Date: Mon Nov 10 2025 - 07:37:05 EST

On 11/7/25 13:35, Dragos Tatulea wrote:

On Thu, Nov 06, 2025 at 05:18:33PM -0800, Jakub Kicinski wrote:

On Thu, 6 Nov 2025 17:25:43 +0000 Dragos Tatulea wrote:

On Wed, Nov 05, 2025 at 06:56:46PM -0800, Mina Almasry wrote:

On Wed, Nov 5, 2025 at 6:22 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:

Increasing cache sizes to the max seems very hacky at best.
The underlying implementation uses genpool and doesn't even
bother to do batching.

OK, my bad. I tried to think through downsides of arbitrarily
increasing the ring size in a ZC scenario where the underlying memory
is pre-pinned and allocated anyway, and I couldn't think of any, but I
won't argue the point any further.

I see a similar issue with io_uring as well: for a 9K MTU with 4K ring
size there are ~1% allocation errors during a simple zcrx test.

mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly
that size (16K * 4K). Increasing the buffer doesn't help because the
pool size is still what the driver asked for (+ also the
internal pool limit). Even worse: eventually ENOSPC is returned to the
application. But maybe this error has a different fix.

Hm, yes, did you trace it all the way to where it comes from?
page pool itself does not have any ENOSPC AFAICT. If the cache
is full we free the page back to the provider via .release_netmem

Yes I did. It happens in io_cqe_cache_refill() when there are no more
CQEs:
https://elixir.bootlin.com/linux/v6.17.7/source/io_uring/io_uring.c#L775

-ENOSPC here means io_uring's CQ got full. It's non-fatal, the user
is expected to process completions and reissue the request. And it's
best to avoid that for performance reasons, e.g. by making the CQ
bigger as you already noted.

--
Pavel Begunkov