Re: [PATCH net-next] page_pool: Clamp ring size to 32K

From: Jesper Dangaard Brouer
Date: Mon Aug 07 2023 - 07:45:17 EST




On 07/08/2023 05.49, Ratheesh Kannoth wrote:
https://lore.kernel.org/netdev/20230804133512.4dbbbc16@xxxxxxxxxx/T/
Capping the recycle ring to 32k instead of returning the error.


Page pool (PP) is just a cache of pages. The driver octeontx2 (in link)
is creating an excessive large cache of pages. The drivers RX
descriptor ring size should be independent of the PP ptr_ring size, as
it is just a cache that grows as a functions of the in-flight packet
workload, it functions as a "shock absorber".

32768 pages (4KiB) is approx 128 MiB, and this will be per RX-queue.

The RX-desc ring (obviously) pins down these pages (immediately), but PP
ring starts empty. As the workload varies the "shock absorber" effect
will let more pages into the system, that will travel the PP ptr_ring.
As all pages originating from the same PP instance will get recycled,
the in-flight pages in the "system" (PP ptr_ring) will grow over time.

The PP design have the problem that it never releases or reduces pages
in this shock absorber "closed" system. (Cc. PP people/devel) we should
consider implementing a MM shrinker callback (include/linux/shrinker.h).

Are the systems using driver octeontx2 ready to handle 128MiB memory per
RX-queue getting pinned down overtime? (this could lead to some strange
do debug situation if the memory is not sufficient)

--Jesper

Suggested-by: Jakub Kicinski <kuba@xxxxxxxxxx>
Signed-off-by: Ratheesh Kannoth <rkannoth@xxxxxxxxxxx>
---
net/core/page_pool.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 5d615a169718..404f835a94be 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -182,9 +182,9 @@ static int page_pool_init(struct page_pool *pool,
if (pool->p.pool_size)
ring_qsize = pool->p.pool_size;
- /* Sanity limit mem that can be pinned down */
+ /* Clamp to 32K */
if (ring_qsize > 32768)
- return -E2BIG;
+ ring_qsize = 32768;
/* DMA direction is either DMA_FROM_DEVICE or DMA_BIDIRECTIONAL.
* DMA_BIDIRECTIONAL is for allowing page used for DMA sending,