Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement
From: Sven Van Asbroeck
Date: Tue Dec 08 2020 - 16:55:29 EST
Hi Jakub, thank you so much for reviewing this patchset !
On Tue, Dec 8, 2020 at 2:43 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>
> > When the chip is working with the default 1500 byte MTU, a 9K
> > dma buffer goes from chip -> cpu per 1500 byte frame. This means
> > that to get 1G/s ethernet bandwidth, we need 6G/s PCIe bandwidth !
> >
> > Fix by limiting the rx ring dma buffer size to the current MTU
> > size.
>
> I'd guess this is a memory allocate issue, not a bandwidth thing.
> for 9K frames the driver needs to do order-2 allocations of 16K.
> For 1500 2K allocations are sufficient (which is < 1 page, hence
> a lot cheaper).
That's a good question. I used perf to create a flame graph of what
the cpu was doing when receiving data at high speed. It showed that
__dma_page_dev_to_cpu took up most of the cpu time. Which is triggered
by dma_unmap_single(9K, DMA_FROM_DEVICE).
So I assumed that it's a PCIe dma bandwidth issue, but I could be wrong -
I didn't do any PCIe bandwidth measurements.
>
> > Tested with iperf3 on a freescale imx6 + lan7430, both sides
> > set to mtu 1500 bytes.
> >
> > Before:
> > [ ID] Interval Transfer Bandwidth Retr
> > [ 4] 0.00-20.00 sec 483 MBytes 203 Mbits/sec 0
> > After:
> > [ ID] Interval Transfer Bandwidth Retr
> > [ 4] 0.00-20.00 sec 1.15 GBytes 496 Mbits/sec 0
> >
> > And with both sides set to MTU 9000 bytes:
> > Before:
> > [ ID] Interval Transfer Bandwidth Retr
> > [ 4] 0.00-20.00 sec 1.87 GBytes 803 Mbits/sec 27
> > After:
> > [ ID] Interval Transfer Bandwidth Retr
> > [ 4] 0.00-20.00 sec 1.98 GBytes 849 Mbits/sec 0
> >
> > Tested-by: Sven Van Asbroeck <thesven73@xxxxxxxxx> # lan7430
> > Signed-off-by: Sven Van Asbroeck <thesven73@xxxxxxxxx>
>
> This is a performance improvement, not a fix, it really needs to target
> net-next.
I thought it'd be cool if 'historic' kernels could benefit from this performance
improvement too, but yeah if it's against policy it should go into net-next.
What about the other patch in the patchset (ping-pong). Should it go into
net-next as well?
>
> > diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
> > index ebb5e0bc516b..2bded1c46784 100644
> > --- a/drivers/net/ethernet/microchip/lan743x_main.c
> > +++ b/drivers/net/ethernet/microchip/lan743x_main.c
> > @@ -1957,11 +1957,11 @@ static int lan743x_rx_next_index(struct lan743x_rx *rx, int index)
> >
> > static struct sk_buff *lan743x_rx_allocate_skb(struct lan743x_rx *rx)
> > {
> > - int length = 0;
> > + struct net_device *netdev = rx->adapter->netdev;
> >
> > - length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
> > - return __netdev_alloc_skb(rx->adapter->netdev,
> > - length, GFP_ATOMIC | GFP_DMA);
> > + return __netdev_alloc_skb(netdev,
> > + netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING,
> > + GFP_ATOMIC | GFP_DMA);
> > }
> >
> > static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index,
> > @@ -1969,9 +1969,10 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index,
> > {
> > struct lan743x_rx_buffer_info *buffer_info;
> > struct lan743x_rx_descriptor *descriptor;
> > - int length = 0;
> > + struct net_device *netdev = rx->adapter->netdev;
> > + int length;
> >
> > - length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
> > + length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;
> > descriptor = &rx->ring_cpu_ptr[index];
> > buffer_info = &rx->buffer_info[index];
> > buffer_info->skb = skb;
> > @@ -2157,8 +2158,8 @@ static int lan743x_rx_process_packet(struct lan743x_rx *rx)
> > int index = first_index;
> >
> > /* multi buffer packet not supported */
> > - /* this should not happen since
> > - * buffers are allocated to be at least jumbo size
> > + /* this should not happen since buffers are allocated
> > + * to be at least the mtu size configured in the mac.
> > */
> >
> > /* clean up buffers */
> > @@ -2632,9 +2633,13 @@ static int lan743x_netdev_change_mtu(struct net_device *netdev, int new_mtu)
> > struct lan743x_adapter *adapter = netdev_priv(netdev);
> > int ret = 0;
> >
> > + if (netif_running(netdev))
> > + return -EBUSY;
>
> That may cause a regression to users of the driver who expect to be
> able to set the MTU when the device is running. You need to disable
> the NAPI, pause the device, swap the buffers for smaller / bigger ones
> and restart the device.
That's what I tried first, but I quickly ran into a spot of trouble:
restarting the device may fail (unlikely but possible). So when the user tries
to change the mtu and that errors out, they might end up with a stopped device.
Is that acceptable behaviour? If so, I'll add it to the patch.
>
> > ret = lan743x_mac_set_mtu(adapter, new_mtu);
> > if (!ret)
> > netdev->mtu = new_mtu;
> > +
> > return ret;
> > }
> >
>