[PATCH net v1 2/2] gve: use max allowed ring size for ZC page_pools

From: Mina Almasry

Date: Wed Nov 05 2025 - 15:08:09 EST


NCCL workloads with NCCL_P2P_PXN_LEVEL=2 or 1 are very slow with the
current gve devmem tcp configuration.

Root causing showed that this particular workload results in a very
bursty pattern of devmem allocations and frees, exhausting the page_pool
ring buffer. This results in sock_devmem_dontneed taking up to 5ms to
free a batch of 128 netmems, as each free does not find an available
entry in the pp->ring, and going all the way down to the (slow) gen_pool,
and gve_alloc_buffer running into a burst of successive allocations
which also don't find entries in the pp->ring (not dontneed'd yet,
presumably), each allocation taking up to 100us, slowing down the napi
poll loop.

>From there, the slowness of the napi poll loop results, I suspect,
in the rx buffers not being processed in time, and packet drops
detected by tcpdump. The total sum of all this badness results in this
workload running at around 0.5 GB/s, when expected perf is around 12
GB/s.

This entire behavior can be avoided by increasing the pp->ring size to the
max allowed 16384. This makes the pp able to handle the bursty
alloc/frees of this particular workload. AFACT there should be no
negative side effect of arbitrarily increasing the pp->ring size in this
manner for ZC configs - the memory is prealloced and pinned by the
memory provider anyway.

Tested by running AllToAll PXN=2 workload. Before:

Avg bus bandwidth : 0.434191

After:

Avg bus bandwidth : 12.5494

Note that there is more we can do to optimize this path, such as bulk
netmem dontneeds, bulk netmem pp refills, and possibly taking a page
from the iouring zcrx playbook and replacing the gen_pool with a simpler
fixed-size array based allocator, but this seems sufficient to fix these
critcal workloads.

With thanks to Willem and Eric for helping root cause this,

Cc: ziweixiao@xxxxxxxxxx
Fixes: 62d7f40503bc ("gve: support unreadable netmem")
Reported-by: Vedant Mathur <vedantmathur@xxxxxxxxxx>
Signed-off-by: Mina Almasry <almasrymina@xxxxxxxxxx>
---
drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
index 0e2b703c673a..f63ffdd3b3ba 100644
--- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
@@ -8,6 +8,8 @@
#include "gve.h"
#include "gve_utils.h"

+#include "net/netdev_queues.h"
+
int gve_buf_ref_cnt(struct gve_rx_buf_state_dqo *bs)
{
return page_count(bs->page_info.page) - bs->page_info.pagecnt_bias;
@@ -263,6 +265,8 @@ struct page_pool *gve_rx_create_page_pool(struct gve_priv *priv,
if (priv->header_split_enabled) {
pp.flags |= PP_FLAG_ALLOW_UNREADABLE_NETMEM;
pp.queue_idx = rx->q_num;
+ if (netif_rxq_has_unreadable_mp(priv->dev, rx->q_num))
+ pp.pool_size = PAGE_POOL_MAX_RING_SIZE;
}

return page_pool_create(&pp);
--
2.51.2.1026.g39e6a42477-goog