RE: Re: [PATCH net-next v2] octeontx2-pf: Add support for page pool

From: Ratheesh Kannoth
Date: Fri May 19 2023 - 01:33:08 EST


> -----Original Message-----
> From: Sunil Kovvuri Goutham <sgoutham@xxxxxxxxxxx>
> Sent: Friday, May 19, 2023 10:50 AM
> To: Yunsheng Lin <linyunsheng@xxxxxxxxxx>; Ratheesh Kannoth
> <rkannoth@xxxxxxxxxxx>; netdev@xxxxxxxxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Cc: davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx; kuba@xxxxxxxxxx;
> pabeni@xxxxxxxxxx; Subbaraya Sundeep Bhatta <sbhatta@xxxxxxxxxxx>;
> Geethasowjanya Akula <gakula@xxxxxxxxxxx>; Srujana Challa
> <schalla@xxxxxxxxxxx>; Hariprasad Kelam <hkelam@xxxxxxxxxxx>
> Subject: RE: [EXT] Re: [PATCH net-next v2] octeontx2-pf: Add support for
> page pool
>
> >
> > ----------------------------------------------------------------------
> > On 2023/5/19 9:52, Ratheesh Kannoth wrote:
> > >> -------------------------------------------------------------------
> > >> --
> > >> - On 2023/5/18 13:51, Ratheesh Kannoth wrote:
> > >>> Page pool for each rx queue enhance rx side performance by
> > >>> reclaiming buffers back to each queue specific pool. DMA mapping
> > >>> is done only for first allocation of buffers.
> > >>> As subsequent buffers allocation avoid DMA mapping, it results in
> > >>> performance improvement.
> > >>>
> > >>> Image | Performance with Linux kernel Packet Generator
> > >>
> > >> Is there any more detailed info for the performance data?
> > >> 'kernel Packet Generator' means using pktgen module in the
> > >> net/core/pktgen.c? it seems pktgen is more for tx, is there any
> > >> abvious reason why the page pool optimization for rx have brought
> > >> about ten times improvement?
> > > We used packet generator for TX machine. Performance data is for RX
> > > DUT. I will remove Packet generator text from the commit message as
> > > it gives
> > ambiguous information
> > > DUT Rx <------------------------- TX (Linux machine with packet
> generator)
> > > (page pool support)
> >
> > Thanks for clarifying.
> > DUT is for 'Device Under Test'?
Yes

> > what does DUT do after it receive a packet? XDP DROP?
We did not use any XDP programs to drop the packets. Stack drops them as there are no listeners for these packets.


> > >
> > >>
> > >>> ------------ | -----------------------------------------------
> > >>> Vannila | 3Mpps
> > >>> |
> > >>> with this | 42Mpps
> > >>> change |
> > >>> -------------------------------------------------------------
> > >>>
> > >>
> > >> ...
> > >>
> > >>> static int __otx2_alloc_rbuf(struct otx2_nic *pfvf, struct otx2_pool
> *pool,
> > >>> dma_addr_t *dma)
> > >>> {
> > >>> u8 *buf;
> > >>>
> > >>> + if (pool->page_pool)
> > >>> + return otx2_alloc_pool_buf(pfvf, pool, dma);
> > >>> +
> > >>> buf = napi_alloc_frag_align(pool->rbsize, OTX2_ALIGN);
> > >>> if (unlikely(!buf))
> > >>> return -ENOMEM;
> > >>
> > >> It seems the above is dead code when using 'select PAGE_POOL', as
> > >> PAGE_POOL config is always selected by the driver?
> > > _otx2_alloc_rbuf() is common code for RX and TX. For RX,
> > > pool->page_pool
> > != NULL, so allocation is from page pool.
> > >
> >
> > Am I missing something here? 'buf' is dma-mapped with
> DMA_FROM_DEVICE,
> > can it be used for TX?
> >
> > Also, what does 'r' in _otx2_alloc_rbuf() mean?
> >
>
> HW takes care of cache coherency between device and CPU, hence
> DMA_ATTR_SKIP_CPU_SYNC was used. Direction of DMA doesn't matter
> here. Hence instead of duplicating the same API ' otx2_alloc_rbuf' was used
> for both Rx and Tx. 'r' stands for receive.
>
> Thanks,
> Sunil.

-Ratheesh