Re: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool

From: Ilias Apalodimas
Date: Thu Jul 04 2019 - 06:31:15 EST


HI Jesper, Ivan,

> On Wed, 3 Jul 2019 12:37:50 +0200
> Jose Abreu <Jose.Abreu@xxxxxxxxxxxx> wrote:
>
> > @@ -3547,6 +3456,9 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
> >
> > napi_gro_receive(&ch->rx_napi, skb);
> >
> > + page_pool_recycle_direct(rx_q->page_pool, buf->page);
>
> This doesn't look correct.
>
> The page_pool DMA mapping cannot be "kept" when page traveling into the
> network stack attached to an SKB. (Ilias and I have a long term plan[1]
> to allow this, but you cannot do it ATM).
>
> You will have to call:
> page_pool_release_page(rx_q->page_pool, buf->page);
>
> This will do a DMA-unmap, and you will likely loose your performance
> gain :-(
>
>
> > + buf->page = NULL;
> > +
> > priv->dev->stats.rx_packets++;
> > priv->dev->stats.rx_bytes += frame_len;
> > }
>
> Also remember that the page_pool requires you driver to do the DMA-sync
> operation. I see a dma_sync_single_for_cpu(), but I didn't see a
> dma_sync_single_for_device() (well, I noticed one getting removed).
> (For some HW Ilias tells me that the dma_sync_single_for_device can be
> elided, so maybe this can still be correct for you).
On our case (and in the page_pool API in general) you have to track buffers when
both .ndo_xdp_xmit() and XDP_TX are used.
So the lifetime of a packet might be

1. page pool allocs packet. The API doesn't sync but i *think* you don't have to
explicitly since the CPU won't touch that buffer until the NAPI handler kicks
in. On the napi handler you need to dma_sync_single_for_cpu() and process the
packet.
2a) no XDP is required so the packet is unmapped and free'd
2b) .ndo_xdp_xmit is called so tyhe buffer need to be mapped/unmapped
2c) XDP_TX is called. In that case we re-use an Rx buffer so we need to
dma_sync_single_for_device()
2a and 2b won't cause any issues
In 2c the buffer will be recycled and fed back to the device with a *correct*
sync (for_device) and all those buffers are allocated as DMA_BIDIRECTIONAL.

So bvottom line i *think* we can skip the dma_sync_single_for_device() on the
initial allocation *only*. If am terribly wrong please let me know :)

Thanks
/Ilias
>
>
> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool02_SKB_return_callback.org
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Principal Kernel Engineer at Red Hat
> LinkedIn: http://www.linkedin.com/in/brouer