Re: [PATCH RFC v4 net-next 0/5] virtio_net: enabling tx interrupts

From: Jason Wang
Date: Tue Dec 02 2014 - 04:52:04 EST

On Tue, Dec 2, 2014 at 5:43 PM, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote:
On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang <jasowang@xxxxxxxxxx> wrote:
>On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote:
>>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote:
>>> Hello:
>>> We used to orphan packets before transmission for virtio-net. This
>>> socket accounting and can lead serveral functions won't work, e.g:
>>> - Byte Queue Limit depends on tx completion nofication to work.
>>> - Packet Generator depends on tx completion nofication for the last
>>> transmitted packet to complete.
>>> - TCP Small Queue depends on proper accounting of sk_wmem_alloc to
>>> This series tries to solve the issue by enabling tx interrupts. To
>>> the performance impacts of this, several optimizations were used:
>>> - In guest side, virtqueue_enable_cb_delayed() was used to delay the
>>> interrupt untile 3/4 pending packets were sent.
>>> - In host side, interrupt coalescing were used to reduce tx
>>> Performance test results[1] (tx-frames 16 tx-usecs 16) shows:
>>> - For guest receiving. No obvious regression on throughput were
>>> noticed. More cpu utilization were noticed in few cases.
>>> - For guest transmission. Very huge improvement on througput for
>>> packet transmission were noticed. This is expected since TSQ and
>>> optimization for small packet transmission work after tx interrupt.
>>> will use more cpu for large packets.
>>> - For TCP_RR, regression (10% on transaction rate and cpu
>>>utilization) were
>>> found. Tx interrupt won't help but cause overhead in this case.
>>> more aggressive coalescing parameters may help to reduce the
>>OK, you do have posted coalescing patches - does it help any?
>Helps a lot.
>For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs)
>For small packet TX, it increases 33% - 245% throughput. (reduce about 60%
>For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80% tx intrs)
>>I'm not sure the regression is due to interrupts.
>>It would make sense for CPU but why would it
>>hurt transaction rate?
>Anyway guest need to take some cycles to handle tx interrupts.
>And transaction rate does increase if we coalesces more tx interurpts.
>>It's possible that we are deferring kicks too much due to BQL.
>>As an experiment: do we get any of it back if we do
>>- if (kick || netif_xmit_stopped(txq))
>>- virtqueue_kick(sq->vq);
>>+ virtqueue_kick(sq->vq);
>I will try, but during TCP_RR, at most 1 packets were pending,
>I suspect if BQL can help in this case.
Looks like this helps a lot in multiple sessions of TCP_RR.

so what's faster
BQL + kick each packet
no BQL

Quick and manual tests (TCP_RR 64, TCP_STREAM 512) does not show obvious differences.

May need a complete benchmark to see.

How about move the BQL patch out of this series?
Let's first converge tx interrupt and then introduce it?
(e.g with kicking after queuing X bytes?)

Sounds good.

