Re: [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local

From: Yunsheng Lin
Date: Thu Dec 05 2024 - 06:43:46 EST


On 2024/12/5 9:28, Jakub Kicinski wrote:
> On Wed, 4 Dec 2024 19:01:14 +0800 Yunsheng Lin wrote:
>>> I don't think this is in the right place.
>>> Why not inside page_pool_disable_direct_recycling() ?
>>
>> It is in page_pool_destroy() mostly because:
>> 1. Only call synchronize_rcu() when there is inflight pages, which should
>> be an unlikely case, and synchronize_rcu() might need to be called at
>> least for the case of pool->p.napi not being NULL if it is called inside
>> page_pool_disable_direct_recycling().
>
> Right, my point was that page_pool_disable_direct_recycling()
> is an exported function, its callers also need to be protected.

It depends on what is the callers is trying to protect by calling
page_pool_disable_direct_recycling().

It seems the use case for the only user of the API in bnxt driver
is about reuseing the same NAPI for different page_pool instances.

According to the steps in netdev_rx_queue.c:
1. allocate new queue memory & create page_pool
2. stop old rx queue.
3. start new rx queue with new page_pool
4. free old queue memory + destroy page_pool.

The page_pool_disable_direct_recycling() is called in step 2, I am
not sure how napi_enable() & napi_disable() are called in the above
flow, but it seems there is no use-after-free problem this patch is
trying to fix for the above flow.

It doesn't seems to have any concurrent access problem if napi->list_owner
is set to -1 before napi_disable() returns and the napi_enable() for the
new queue is called after page_pool_disable_direct_recycling() is called
in step 2.