Re: [PATCH] net: fec: handle page_pool_dev_alloc_pages error

From: Kevin Groeneveld
Date: Tue Dec 24 2024 - 15:20:36 EST


On 2024-12-24 07:56, Wei Fang wrote:
Please simplify the log, it's too long.

I will simplify it.

@@ -1943,10 +1943,12 @@ static int fec_enet_rx_napi(struct napi_struct
*napi, int budget)
struct fec_enet_private *fep = netdev_priv(ndev);
int done = 0;

+ fep->rx_err_nomem = false;
+
do {
done += fec_enet_rx(ndev, budget - done);
fec_enet_tx(ndev, budget);
- } while ((done < budget) && fec_enet_collect_events(fep));
+ } while ((done < budget) && !fep->rx_err_nomem &&
fec_enet_collect_events(fep));

Is the condition "!fep->rx_err_nomem" necessary here? If not, then there
is no need to add this variable to fec_enet_private.

For my test case it often seems to loop forever without making any progress unless I add that condition.

One situation I am concerned about is that when the issue occurs, the Rx
rings are full. At the same time, because the 'done < budget' condition is
met, the interrupt mode will be used to receive the packets. However,
since the Rx rings are full, no Rx interrupt events will be generated. This
means that the packets on the Rx rings may not be received by the CPU
for a long time unless Tx interrupt events are generated.

These are the types of things I was worried might exist with my patch.

Another approach is to discard the packets when the issue occurs, as
shown below. Note that the following modification has not been verified.

-static void fec_enet_update_cbd(struct fec_enet_priv_rx_q *rxq,
+static int fec_enet_update_cbd(struct fec_enet_priv_rx_q *rxq,
struct bufdesc *bdp, int index)
{
struct page *new_page;
dma_addr_t phys_addr;

new_page = page_pool_dev_alloc_pages(rxq->page_pool);
- WARN_ON(!new_page);
+ if (unlikely(!new_page))
+ return -ENOMEM;
+
rxq->rx_skb_info[index].page = new_page;

rxq->rx_skb_info[index].offset = FEC_ENET_XDP_HEADROOM;
phys_addr = page_pool_get_dma_addr(new_page) + FEC_ENET_XDP_HEADROOM;
bdp->cbd_bufaddr = cpu_to_fec32(phys_addr);
+
+ return 0;
}

static u32
@@ -1771,7 +1775,10 @@ fec_enet_rx_queue(struct net_device *ndev, int budget, u16 queue_id)
pkt_len,
DMA_FROM_DEVICE);
prefetch(page_address(page));
- fec_enet_update_cbd(rxq, bdp, index);
+ if (fec_enet_update_cbd(rxq, bdp, index)) {
+ ndev->stats.rx_dropped++;
+ goto rx_processing_done;
+ }

if (xdp_prog) {
xdp_buff_clear_frags_flag(&xdp);

Thanks for the suggestion. I had considered something similar but I was not sure it was safe to just jump to rx_processing_done at that point in the code. I will try your patch and if it seems to work okay I will submit a new version.

I probably will not have time to work on this further until the new year.

Kevin