Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools
From: Pavel Begunkov
Date: Mon Nov 10 2025 - 07:37:05 EST
On 11/7/25 13:35, Dragos Tatulea wrote:
On Thu, Nov 06, 2025 at 05:18:33PM -0800, Jakub Kicinski wrote:
On Thu, 6 Nov 2025 17:25:43 +0000 Dragos Tatulea wrote:Yes I did. It happens in io_cqe_cache_refill() when there are no more
On Wed, Nov 05, 2025 at 06:56:46PM -0800, Mina Almasry wrote:
On Wed, Nov 5, 2025 at 6:22 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:I see a similar issue with io_uring as well: for a 9K MTU with 4K ring
Increasing cache sizes to the max seems very hacky at best.
The underlying implementation uses genpool and doesn't even
bother to do batching.
OK, my bad. I tried to think through downsides of arbitrarily
increasing the ring size in a ZC scenario where the underlying memory
is pre-pinned and allocated anyway, and I couldn't think of any, but I
won't argue the point any further.
size there are ~1% allocation errors during a simple zcrx test.
mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly
that size (16K * 4K). Increasing the buffer doesn't help because the
pool size is still what the driver asked for (+ also the
internal pool limit). Even worse: eventually ENOSPC is returned to the
application. But maybe this error has a different fix.
Hm, yes, did you trace it all the way to where it comes from?
page pool itself does not have any ENOSPC AFAICT. If the cache
is full we free the page back to the provider via .release_netmem
CQEs:
https://elixir.bootlin.com/linux/v6.17.7/source/io_uring/io_uring.c#L775
-ENOSPC here means io_uring's CQ got full. It's non-fatal, the user
is expected to process completions and reissue the request. And it's
best to avoid that for performance reasons, e.g. by making the CQ
bigger as you already noted.
--
Pavel Begunkov