Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

From: Andrew Lunn
Date: Tue Dec 08 2020 - 17:52:42 EST


> That's a good question. I used perf to create a flame graph of what
> the cpu was doing when receiving data at high speed. It showed that
> __dma_page_dev_to_cpu took up most of the cpu time. Which is triggered
> by dma_unmap_single(9K, DMA_FROM_DEVICE).
>
> So I assumed that it's a PCIe dma bandwidth issue, but I could be wrong -
> I didn't do any PCIe bandwidth measurements.

Sometimes it is actually cache operations which take all the
time. This needs to invalidate the cache, so that when the memory is
then accessed, it get fetched from RAM. On SMP machines, cache
invalidation can be expensive, due to all the cross CPU operations.
I've actually got better performance by building a UP kernel on some
low core count ARM CPUs.

There are some tricks which can be played. Do you actually need all
9K? Does the descriptor tell you actually how much is used? You can
get a nice speed up if you just unmap 64 bytes for a TCP ACK, rather
than the full 9K.

Andrew