Re: [PATCH] net: lan966x: fix page_pool error handling in lan966x_fdma_rx_alloc_page_pool()

From: Jakub Kicinski

Date: Fri Apr 03 2026 - 18:26:48 EST

This is an AI-generated review of your patch. The human sending this
email says "Let's make sure the callers can handle the error, too."
---
net: lan966x: fix page_pool error handling in lan966x_fdma_rx_alloc_page_pool()

This commit adds an IS_ERR check after page_pool_create() to return early
on failure. This prevents passing an error pointer to
xdp_rxq_info_reg_mem_model() which would cause a kernel oops when
dereferenced.

> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
> index 7b6369e43451d..34bbcae2f0685 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_fdma.c
> @@ -92,6 +92,9 @@ static int lan966x_fdma_rx_alloc_page_pool(struct lan966x_rx *rx)
>
> rx->page_pool = page_pool_create(&pp_params);
>
> + if (unlikely(IS_ERR(rx->page_pool)))
> + return PTR_ERR(rx->page_pool);
> +

This isn't a bug in the patch itself, but since this patch correctly handles
errors from page_pool_create(), it makes the existing error recovery paths more
reachable.

If lan966x_fdma_rx_alloc_page_pool() succeeds here, it creates a new page pool
and updates port->xdp_rxq memory models to point to it. If the subsequent
fdma_alloc_coherent() call in lan966x_fdma_rx_alloc() fails:

lan966x_fdma.c:lan966x_fdma_rx_alloc() {
if (lan966x_fdma_rx_alloc_page_pool(rx))
return PTR_ERR(rx->page_pool);

err = fdma_alloc_coherent(lan966x->dev, fdma);
if (err)
return err;
...
}

Does this return an error without destroying the newly created page pool?

During an MTU change in lan966x_fdma_reload(), this failure triggers a jump
to the restore: label, which reverts rx->page_pool to the old pool:

lan966x_fdma.c:lan966x_fdma_reload() {
...
err = lan966x_fdma_rx_alloc(&lan966x->rx);
if (err)
goto restore;
...
restore:
lan966x->rx.page_pool = page_pool;
...
}

Does this irrevocably leak the newly created page pool and leave the xdp_rxq
memory models pointing to the leaked pool?

Similarly, in lan966x_fdma_init(), if lan966x_fdma_tx_alloc() fails:

lan966x_fdma.c:lan966x_fdma_init() {
...
err = lan966x_fdma_rx_alloc(&lan966x->rx);
if (err)
return err;

err = lan966x_fdma_tx_alloc(&lan966x->tx);
if (err) {
fdma_free_coherent(lan966x->dev, &lan966x->rx.fdma);
return err;
}
...
}

Does the error handler explicitly free rx.fdma via fdma_free_coherent() but
forget to destroy rx.page_pool, resulting in a permanent memory leak during
probe failure?

Additionally, the error recovery path in lan966x_fdma_reload() appears to
resume hardware DMA into freed pages.

When lan966x_fdma_reload() runs, it first stops the hardware and calls
lan966x_fdma_rx_free_pages(), which returns all RX ring pages back to the
old page pool via page_pool_put_full_page(). If lan966x_fdma_rx_alloc()
subsequently fails, execution jumps to the restore: label:

lan966x_fdma.c:lan966x_fdma_reload() {
...
lan966x_fdma_rx_free_pages(&lan966x->rx);
...
err = lan966x_fdma_rx_alloc(&lan966x->rx);
if (err)
goto restore;
...
restore:
lan966x->rx.page_pool = page_pool;
memcpy(&lan966x->rx.fdma, &fdma_rx_old, sizeof(struct fdma));
lan966x_fdma_rx_start(&lan966x->rx);
...
}

The restore block copies back the old FDMA descriptors, which point to the
pages that were just freed, and calls lan966x_fdma_rx_start() to re-enable
the hardware RX channel.

Since it never re-allocates or re-maps pages for these restored descriptors,
will the hardware actively resume DMAing incoming network packets into
physical addresses that the page pool considers free? If the page pool
returns these pages to the buddy allocator, could this overwrite memory
owned by unrelated kernel subsystems?

> for (int i = 0; i < lan966x->num_phys_ports; ++i) {
> struct lan966x_port *port;
>