Re: [PATCH net-next v7 2/8] page_pool: fix timing for checking and disabling napi_local

From: Yunsheng Lin
Date: Sat Jan 25 2025 - 09:21:51 EST


On 1/25/2025 1:13 AM, Toke Høiland-Jørgensen wrote:
Yunsheng Lin <linyunsheng@xxxxxxxxxx> writes:

So I really don't see a way for this race to happen with correct usage
of the page_pool and NAPI APIs, which means there's no reason to make
the change you are proposing here.

I looked at one driver setting pp->napi, it seems the bnxt driver doesn't
seems to call page_pool_disable_direct_recycling() when unloading, see
bnxt_half_close_nic(), page_pool_disable_direct_recycling() seems to be
only called for the new queue_mgmt API:

/* rtnl_lock held, this call can only be made after a previous successful
* call to bnxt_half_open_nic().
*/
void bnxt_half_close_nic(struct bnxt *bp)
{
bnxt_hwrm_resource_free(bp, false, true);
bnxt_del_napi(bp); *----call napi del and rcu sync----*
bnxt_free_skbs(bp);
bnxt_free_mem(bp, true); *------call page_pool_destroy()----*
clear_bit(BNXT_STATE_HALF_OPEN, &bp->state);
}

Even if there is a page_pool_disable_direct_recycling() called between
bnxt_del_napi() and bnxt_free_mem(), the timing window still exist as
rcu sync need to be called after page_pool_disable_direct_recycling(),
it seems some refactor is needed for bnxt driver to reuse the rcu sync
from the NAPI API, in order to avoid calling the rcu sync for
page_pool_destroy().

Well, I would consider that usage buggy. A page pool object is created
with a reference to the napi struct; so the page pool should also be
destroyed (clearing its reference) before the napi memory is freed. I
guess this is not really documented anywhere, but it's pretty standard
practice to free objects in the opposite order of their creation.

I am not so familiar with rule about the creation API of NAPI, but the
implementation of bnxt driver can have reference of 'struct napi' before
calling netif_napi_add(), see below:

static int __bnxt_open_nic(struct bnxt *bp, bool irq_re_init, bool link_re_init)
{
.......
rc = bnxt_alloc_mem(bp, irq_re_init); *create page_pool*
if (rc) {
netdev_err(bp->dev, "bnxt_alloc_mem err: %x\n", rc);
goto open_err_free_mem;
}

if (irq_re_init) {
bnxt_init_napi(bp); *netif_napi_add*
rc = bnxt_request_irq(bp);
if (rc) {
netdev_err(bp->dev, "bnxt_request_irq err: %x\n", rc);
goto open_err_irq;
}
}

.....
}


So no, I don't think this is something that should be fixed on the page
pool side (and certainly not by adding another synchronize_rcu() call
per queue!); rather, we should fix the drivers that get this wrong (and
probably document the requirement a bit better).

Even if timing problem of checking and disabling napi_local should not
be fixed on the page_pool side, do we have some common understanding
about fixing the DMA API misuse problem on the page_pool side?
If yes, do we have some common understanding about some mechanism
like synchronize_rcu() might be still needed on the page_pool side?

If no, I am not sure if there is still any better about how to fix
the DMA API misuse problem after all the previous discussion?

If yes, it may be better to focus on discussing how to avoid calling rcu
sync for each queue mentioned in [1].

1. https://lore.kernel.org/all/22de6033-744e-486e-bbd9-8950249cd018@xxxxxxxxxx/