Re: [PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools

From: Dragos Tatulea

Date: Fri Nov 07 2025 - 08:35:53 EST


On Thu, Nov 06, 2025 at 05:18:33PM -0800, Jakub Kicinski wrote:
> On Thu, 6 Nov 2025 17:25:43 +0000 Dragos Tatulea wrote:
> > On Wed, Nov 05, 2025 at 06:56:46PM -0800, Mina Almasry wrote:
> > > On Wed, Nov 5, 2025 at 6:22 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> > > > Increasing cache sizes to the max seems very hacky at best.
> > > > The underlying implementation uses genpool and doesn't even
> > > > bother to do batching.
> > >
> > > OK, my bad. I tried to think through downsides of arbitrarily
> > > increasing the ring size in a ZC scenario where the underlying memory
> > > is pre-pinned and allocated anyway, and I couldn't think of any, but I
> > > won't argue the point any further.
> > >
> > I see a similar issue with io_uring as well: for a 9K MTU with 4K ring
> > size there are ~1% allocation errors during a simple zcrx test.
> >
> > mlx5 calculates 16K pages and the io_uring zcrx buffer matches exactly
> > that size (16K * 4K). Increasing the buffer doesn't help because the
> > pool size is still what the driver asked for (+ also the
> > internal pool limit). Even worse: eventually ENOSPC is returned to the
> > application. But maybe this error has a different fix.
>
> Hm, yes, did you trace it all the way to where it comes from?
> page pool itself does not have any ENOSPC AFAICT. If the cache
> is full we free the page back to the provider via .release_netmem
>
Yes I did. It happens in io_cqe_cache_refill() when there are no more
CQEs:
https://elixir.bootlin.com/linux/v6.17.7/source/io_uring/io_uring.c#L775

Looking at the code in zcrx I see that the amount of RQ entries and CQ
entries is 4K, which matches the device ring size, but doesn't match the
amount of pages available in the buffer:
https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L410
https://github.com/isilence/liburing/blob/zcrx/rx-buf-len/examples/zcrx.c#L176

Doubling the CQs (or both RQ and CQ size) makes the ENOSPC go away.

> > Adapting the pool size to the io_uring buffer size works very well. The
> > allocation errors are gone and performance is improved.
> >
> > AFAIU, a page_pool with underlying pre-allocated memory is not really a
> > cache. So it is useful to be able to adapt to the capacity reserved by
> > the application.
> >
> > Maybe one could argue that the zcrx example from liburing could also be
> > improved. But one thing is sure: aligning the buffer size to the
> > page_pool size calculated by the driver based on ring size and MTU
> > is a hassle. If the application provides a large enough buffer, things
> > should "just work".
>
> Yes, there should be no ENOSPC. I think io_uring is more thorough
> in handling the corner cases so what you're describing is more of
> a concern..
>
Is this error something that io_uring should fix or is this similar to
EAGAIN where the application has to retry?

> Keep in mind that we expect multiple page pools from one provider.
> We want the pages to flow back to the MP level so other PPs can grab
> them.
>
Oh, right, I forgot... And this can happen now only for devmem though,
right?

Still, this is an additional reason to give more control to the MP
over the page_pool config, right?

[...]

Thanks,
Dragos