Re: [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local

From: Jakub Kicinski
Date: Thu Dec 05 2024 - 19:42:43 EST


On Thu, 5 Dec 2024 19:43:25 +0800 Yunsheng Lin wrote:
> It depends on what is the callers is trying to protect by calling
> page_pool_disable_direct_recycling().
>
> It seems the use case for the only user of the API in bnxt driver
> is about reuseing the same NAPI for different page_pool instances.
>
> According to the steps in netdev_rx_queue.c:
> 1. allocate new queue memory & create page_pool
> 2. stop old rx queue.
> 3. start new rx queue with new page_pool
> 4. free old queue memory + destroy page_pool.
>
> The page_pool_disable_direct_recycling() is called in step 2, I am
> not sure how napi_enable() & napi_disable() are called in the above
> flow, but it seems there is no use-after-free problem this patch is
> trying to fix for the above flow.
>
> It doesn't seems to have any concurrent access problem if napi->list_owner
> is set to -1 before napi_disable() returns and the napi_enable() for the
> new queue is called after page_pool_disable_direct_recycling() is called
> in step 2.

The fix is presupposing there is long delay between fetching of
the NAPI pointer and its access. The concern is that NAPI gets
restarted in step 3 after we already READ_ONCE()'ed the pointer,
then we access it and judge it to be running on the same core.
Then we put the page into the fast cache which will never get
flushed.