RE: [PATCH] net: fec: handle page_pool_dev_alloc_pages error

From: Wei Fang
Date: Tue Dec 24 2024 - 20:39:14 EST


> I will simplify it.
>
> >> @@ -1943,10 +1943,12 @@ static int fec_enet_rx_napi(struct napi_struct
> >> *napi, int budget)
> >> struct fec_enet_private *fep = netdev_priv(ndev);
> >> int done = 0;
> >>
> >> + fep->rx_err_nomem = false;
> >> +
> >> do {
> >> done += fec_enet_rx(ndev, budget - done);
> >> fec_enet_tx(ndev, budget);
> >> - } while ((done < budget) && fec_enet_collect_events(fep));
> >> + } while ((done < budget) && !fep->rx_err_nomem &&
> >> fec_enet_collect_events(fep));
> >
> > Is the condition "!fep->rx_err_nomem" necessary here? If not, then there
> > is no need to add this variable to fec_enet_private.
>
> For my test case it often seems to loop forever without making any
> progress unless I add that condition.
>
> > One situation I am concerned about is that when the issue occurs, the Rx
> > rings are full. At the same time, because the 'done < budget' condition is
> > met, the interrupt mode will be used to receive the packets. However,
> > since the Rx rings are full, no Rx interrupt events will be generated. This
> > means that the packets on the Rx rings may not be received by the CPU
> > for a long time unless Tx interrupt events are generated.
>
> These are the types of things I was worried might exist with my patch.
>
> > Another approach is to discard the packets when the issue occurs, as
> > shown below. Note that the following modification has not been verified.
> >
> > -static void fec_enet_update_cbd(struct fec_enet_priv_rx_q *rxq,
> > +static int fec_enet_update_cbd(struct fec_enet_priv_rx_q *rxq,
> > struct bufdesc *bdp, int index)
> > {
> > struct page *new_page;
> > dma_addr_t phys_addr;
> >
> > new_page = page_pool_dev_alloc_pages(rxq->page_pool);
> > - WARN_ON(!new_page);
> > + if (unlikely(!new_page))
> > + return -ENOMEM;
> > +
> > rxq->rx_skb_info[index].page = new_page;
> >
> > rxq->rx_skb_info[index].offset = FEC_ENET_XDP_HEADROOM;
> > phys_addr = page_pool_get_dma_addr(new_page) +
> FEC_ENET_XDP_HEADROOM;
> > bdp->cbd_bufaddr = cpu_to_fec32(phys_addr);
> > +
> > + return 0;
> > }
> >
> > static u32
> > @@ -1771,7 +1775,10 @@ fec_enet_rx_queue(struct net_device *ndev, int
> budget, u16 queue_id)
> > pkt_len,
> > DMA_FROM_DEVICE);
> > prefetch(page_address(page));
> > - fec_enet_update_cbd(rxq, bdp, index);
> > + if (fec_enet_update_cbd(rxq, bdp, index)) {
> > + ndev->stats.rx_dropped++;
> > + goto rx_processing_done;
> > + }
> >
> > if (xdp_prog) {
> > xdp_buff_clear_frags_flag(&xdp);
>
> Thanks for the suggestion. I had considered something similar but I was
> not sure it was safe to just jump to rx_processing_done at that point in
> the code. I will try your patch and if it seems to work okay I will
> submit a new version.

Okay, but this is just an example, the official modification still needs to be
improved, such as moving 'goto rx_processing_done' statement before
dma_sync_single_for_cpu().

>
> I probably will not have time to work on this further until the new year.
>

Ok, thanks a lot for helping fix this.