Re: [PATCH net-next v10 3/4] page_pool: support unlimited number of inflight pages
From: Yunsheng Lin
Date: Wed Mar 05 2025 - 07:19:53 EST
On 2025/3/4 17:25, Jesper Dangaard Brouer wrote:
>> }
>> @@ -514,10 +518,42 @@ static struct page_pool_item *page_pool_fast_item_alloc(struct page_pool *pool)
>> return llist_entry(first, struct page_pool_item, lentry);
>> }
>> +static struct page_pool_item *page_pool_slow_item_alloc(struct page_pool *pool)
>> +{
>> + if (unlikely(!pool->slow_items.block ||
>> + pool->slow_items.next_to_use >= ITEMS_PER_PAGE)) {
>> + struct page_pool_item_block *block;
>> + struct page *page;
>> +
>> + page = alloc_pages_node(pool->p.nid, GFP_ATOMIC | __GFP_NOWARN |
>> + __GFP_ZERO, 0);
>> + if (!page) {
>> + alloc_stat_inc(pool, item_slow_failed);
>> + return NULL;
>> + }
>
> I'm missing a counter that I can use to monitor the rate of page
> allocations for these "item" block's.
> In production want to have a metric that shows me a sudden influx of
> that cause code to hit this "item_slow_alloc" case (inflight_slow_alloc)
It seems the 'item_fast_empty' stat added in patch 2 is the metric you
mention above? as we use those slow_items sequentially, the pages allocated
for slow_items can be calculated by 'item_fast_empty' / ITEMS_PER_PAGE.
>
> BTW should this be called "inflight_block" instead of "item_block"?
>
>
>> +
>> + block = page_address(page);
>> + block->pp = pool;
>> + block->flags |= PAGE_POOL_SLOW_ITEM_BLOCK_BIT;
>> + refcount_set(&block->ref, ITEMS_PER_PAGE);
>> + pool->slow_items.block = block;
>> + pool->slow_items.next_to_use = 0;
>> +
>> + spin_lock_bh(&pool->item_lock);
>> + list_add(&block->list, &pool->item_blocks);
>> + spin_unlock_bh(&pool->item_lock);
>> + }
>> +
>> + return &pool->slow_items.block->items[pool->slow_items.next_to_use++];
>> +}
...
>> +static void __page_pool_slow_item_free(struct page_pool *pool,
>> + struct page_pool_item_block *block)
>> +{
>> + spin_lock_bh(&pool->item_lock);
>> + list_del(&block->list);
>> + spin_unlock_bh(&pool->item_lock);
>> +
>> + put_page(virt_to_page(block));
>
> Here again I'm missing a counter that I can use to monitor the rate of
> page free events.
>
> In production I want a metric (e.g inflight_slow_free_block) that
> together with "item_slow_alloc" (perhaps named
> inflight_slow_alloc_block), show me if this code path is creating churn,
> that I can correlate/explain some other influx event on the system.
>
> BTW subtracting these (alloc - free) counters gives us the memory used.
If I understand it correctly, the 'item_fast_empty' is something like
the 'alloc' mentioned above, let's discuss the 'free' below.
>
>> +}
>> +
...
>> }
>> +static int page_pool_nl_fill_item_mem_info(struct page_pool *pool,
>> + struct sk_buff *rsp)
>> +{
>> + struct page_pool_item_block *block;
>> + size_t resident = 0, used = 0;
>> + int err;
>> +
>> + spin_lock_bh(&pool->item_lock);
>> +
>> + list_for_each_entry(block, &pool->item_blocks, list) {
>> + resident += PAGE_SIZE;
>> +
>> + if (block->flags & PAGE_POOL_SLOW_ITEM_BLOCK_BIT)
>> + used += (PAGE_SIZE - sizeof(struct page_pool_item) *
>> + refcount_read(&block->ref));
>> + else
>> + used += PAGE_SIZE;
>> + }
>> +
>> + spin_unlock_bh(&pool->item_lock);
>
> Holding a BH spin_lock can easily create production issues.
The above is not only give us the total pages used for page_pool_item,
but also give us the fragmentation info for those pages too.
So it seems the BH spin_lock is needed if we want the fragmentation info?
And the 'free' memtioned above can be calculated by 'memory used' - 'alloc'.
> I worry how long time it will take to traverse these lists.
I wouldn't worry about that as it is not supposed to be a lot of pages
in those list, if it is, it seems it is something we should be fixing
by increasing the size of fast_item or by defragmenting the slow_item
pages.
>
> We (Cc Yan) are currently hunting down a number of real production issue
> due to different cases of control-path code querying the kernel that
> takes a _bh lock to read data, hurting the data-path processing.
I am not sure if the above is the control-path here, I would rather
treat it as the debug-path?
>
> If we had the stats counters, then this would be less work, right?
It depends on if we want the fragmentation info or not as mentioned
above.
>
>