Re: [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local

From: Yunsheng Lin
Date: Sat Dec 07 2024 - 00:52:34 EST

Next message: Furong Xu: "Re: [PATCH net] net: stmmac: fix TSO DMA API usage causing oops"
Previous message: Bagas Sanjaya: "Re: [PATCH] MAINTAINERS: Remove Albert Ou from riscv"
In reply to: Jakub Kicinski: "Re: [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 12/7/2024 12:09 AM, Jakub Kicinski wrote:

...

It seems the napi_disable() is called before netdev_rx_queue_restart()
and napi_enable() and ____napi_schedule() are called after
netdev_rx_queue_restart() as there is no napi API called in the
implementation of 'netdev_queue_mgmt_ops' for bnxt driver?

If yes, napi->list_owner is set to -1 before step 1 and only set to
a valid cpu in step 6 as below:
1. napi_disable()
2. allocate new queue memory & create new page_pool.
3. stop old rx queue.
4. start new rx queue with new page_pool.
5. free old queue memory + destroy old page_pool.
6. napi_enable() & ____napi_schedule()

And there are at least three flows involved here:
flow 1: calling napi_complete_done() and set napi->list_owner to -1.
flow 2: calling netdev_rx_queue_restart().
flow 3: calling skb_defer_free_flush() with the page belonging to the old
page_pool.

The only case of page_pool_napi_local() returning true in flow 3 I can
think of is that flow 1 and flow 3 might need to be called in the softirq
of the same CPU and flow 3 might need to be called before flow 1.

It seems impossible that page_pool_napi_local() will return true between
step 1 and step 6 as updated napi->list_owner is always seen by flow 3
when they are both called in the softirq context of the same CPU or
napi->list_owner != CPU that calling flow 3, which seems like an implicit
assumption for the case of napi scheduling between different cpus too.

And old page_pool is destroyed in step 5, I am not sure if it is necessary
to call page_pool_disable_direct_recycling() in step 3 if page_pool_destroy()
already have the synchronize_rcu() in step 5 before enabling napi.

If not, maybe I am missing something here.

Yes, I believe you got the steps 5 and 6 backwards.

Maybe, but I am not sure how is it possible that step 6 is called before
step 5 yet.
As it seems two drivers implement 'netdev_queue_mgmt_ops' now and
only bnxt calls page_pool_disable_direct_recycling(), and its
implementation doesn't call napi related API, see bnxt_queue_mgmt_ops:
https://elixir.bootlin.com/linux/v6.13-rc1/source/drivers/net/ethernet/broadcom/bnxt/bnxt.c#L15539

And netdev_rx_queue_restart() seems to call the above ops without
calling any napi related API:
https://elixir.bootlin.com/linux/v6.12.3/source/net/core/netdev_rx_queue.c#L9

The napi related API seems to be only called in bnxt_open_nic() and
bnxt_close_nic() in bnxt driver, and they don't seems to be related
directly to the queue_mgmt_ops.

+cc relevant author and maintainer to see if there is some clarifying
from them as I am not really similar with queue mgmt related sequence.

Next message: Furong Xu: "Re: [PATCH net] net: stmmac: fix TSO DMA API usage causing oops"
Previous message: Bagas Sanjaya: "Re: [PATCH] MAINTAINERS: Remove Albert Ou from riscv"
In reply to: Jakub Kicinski: "Re: [PATCH RFC v4 1/3] page_pool: fix timing for checking and disabling napi_local"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]