On Tue, Apr 25, 2017 at 12:07:01PM +0800, Jason Wang wrote:
I agree here. Let's keep it a constant. Testing on more machines would
On 2017å04æ24æ 20:00, Michael S. Tsirkin wrote:
On Mon, Apr 24, 2017 at 07:54:18PM +0800, Jason Wang wrote:I increase tx_queue_len to 1100, but only see less than 1% improvement on
On 2017å04æ24æ 07:28, Michael S. Tsirkin wrote:OK but it might be e.g. a function of the ring size, host cache size or
On Tue, Apr 18, 2017 at 11:07:42AM +0800, Jason Wang wrote:Ok.
On 2017å04æ17æ 07:19, Michael S. Tsirkin wrote:E.g. I care - I often start sending packets to VM before it's
Applications that consume a batch of entries in one goOk, I will rebase the series on top of this. (Though I don't think we care
can benefit from ability to return some of them back
into the ring.
Add an API for that - assuming there's space. If there's no space
naturally we can't do this and have to drop entries, but this implies
ring is full so we'd likely drop some anyway.
Signed-off-by: Michael S. Tsirkin<mst@xxxxxxxxxx>
---
Jason, in my mind the biggest issue with your batching patchset is the
backet drops on disconnect. This API will help avoid that in the common
case.
the packet loss).
fully booted. Several vhost resets might follow.
The numbers are pretty stable, so probably not noise. Retested on top ofEssentially 4 is enough, other stuf looks more like noiseI would still prefer that we understand what's going on,I try to reply in another thread, does it make sense?
and I wouldYes, I've replied in another thread, the result is:
like to know what's the smallest batch size that's still helpful,
no batching 1.88Mpps
RX_BATCH=1 1.93Mpps
RX_BATCH=4 2.11Mpps
RX_BATCH=16 2.14Mpps
RX_BATCH=64 2.25Mpps
RX_BATCH=256 2.18Mpps
to me. What about 2?
batch zeroing:
no 1.97Mpps
1 2.09Mpps
2 2.11Mpps
4 2.16Mpps
8 2.19Mpps
16 2.21Mpps
32 2.25Mpps
64 2.30Mpps
128 2.21Mpps
256 2.21Mpps
64 performs best.
Thanks
whatever. As we don't really understand the why, if we just optimize for
your setup we risk regressions in others. 64 entries is a lot, it
increases the queue size noticeably. Could this be part of the effect?
Could you try changing the queue size to see what happens?
pps number (batch = 1) in my machine. If you care about the regression, we
probably can leave the choice to user through e.g module parameter. But I'm
afraid we have already had too much choices for them. Or I can test this
with different CPU types.
Thanks
be nice but not strictly required.
I just dislike not understanding why
it helps because it means we can easily break it by mistake. So my only
request really is that you wrap access to this internal buffer in an
API. Let's see - I think we need
struct vhost_net_buf
vhost_net_buf_get_ptr
vhost_net_buf_get_size
vhost_net_buf_is_empty
vhost_net_buf_peek
vhost_net_buf_consume
vhost_net_buf_produce