Re: [RFC v1] hand off skb list to other cpu to submit to upperlayer

From: Stephen Hemminger
Date: Tue Feb 24 2009 - 21:12:16 EST


On Wed, 25 Feb 2009 09:27:49 +0800
"Zhang, Yanmin" <yanmin_zhang@xxxxxxxxxxxxxxx> wrote:

> ïSubject: hand off skb list to other cpu to submit to upper layer
> From: ïZhang Yanmin <yanmin.zhang@xxxxxxxxxxxxxxx>
>
> Recently, I am investigating an ip_forward performance issue with 10G IXGBE NIC.
> I start the testing on 2 machines. Every machine has 2 10G NICs. The 1st one seconds
> packets by pktgen. The 2nd receives the packets from one NIC and forwards them out
> from the 2nd NIC. As NICs supports multi-queue, I bind the queues to different logical
> cpu of different physical cpu while considering cache sharing carefully.
>
> Comparing with sending speed on the 1st machine, the forward speed is not good, only
> about 60% of sending speed. As a matter of fact, IXGBE driver starts NAPI when interrupt
> arrives. When ip_forward=1, receiver collects a packet and forwards it out immediately.
> So although IXGBE collects packets with NAPI, the forwarding really has much impact on
> collection. As IXGBE runs very fast, it drops packets quickly. The better way for
> receiving cpu is doing nothing than just collecting packets.
>
> Currently kernel has backlog to support a similar capability, but process_backlog still
> runs on the receiving cpu. I enhance backlog by adding a new input_pkt_alien_queue to
> softnet_data. Receving cpu collects packets and link them into skb list, then delivers
> the list to the ïinput_pkt_alien_queue of other cpu. process_backlog picks up the skb list
> from ïinput_pkt_alien_queue when ïinput_pkt_queue is empty.
>
> NIC driver could use this capability like below step in NAPI RX cleanup function.
> 1) Initiate a local var struct sk_buff_head skb_head;
> 2) In the packet collection loop, just calls netif_rx_queue or __skb_queue_tail(skb_head, skb)
> to add skb to the list;
> 3) Before exiting, calls raise_netif_irq to submit the skb list to specific cpu.
>
> Enlarge /proc/sys/net/core/netdev_max_backlog and netdev_budget before testing.
>
> I tested my patch on top of 2.6.28.5. The improvement is about 43%.
>
> Signed-off-by: ïZhang Yanmin <yanmin.zhang@xxxxxxxxxxxxxxx>
>
> ---

You can't safely put packets on another CPU queue without adding a spinlock.
And if you add the spinlock, you drop the performance back down for your
device and all the other devices. Also, you will end up reordering
packets which hurts single stream TCP performance.

Is this all because the hardware doesn't do MSI-X or are you testing only
a single flow.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/