Re: [PATCH V4 0/3] basic busy polling support for vhost_net

From: Michael Rapoport
Date: Thu Mar 10 2016 - 01:48:48 EST


Hi Greg,

> Greg Kurz <gkurz@xxxxxxxxxxxxxxxxxx> wrote on 03/09/2016 09:26:45 PM:
> > On Fri, 4 Mar 2016 06:24:50 -0500
> > Jason Wang <jasowang@xxxxxxxxxx> wrote:
>
> > This series tries to add basic busy polling for vhost net. The idea is
> > simple: at the end of tx/rx processing, busy polling for new tx added
> > descriptor and rx receive socket for a while. The maximum number of
> > time (in us) could be spent on busy polling was specified ioctl.
> >
> > Test A were done through:
> >
> > - 50 us as busy loop timeout
> > - Netperf 2.6
> > - Two machines with back to back connected mlx4
>
> Hi Jason,
>
> Could this also improve performance if both VMs are
> on the same host system ?

I've experimented a little with Jason's patches and guest-to-guest netperf
when both guests were on the same host, and I saw improvements for that
case.

> > - Guest with 1 vcpus and 1 queue
> >
> > Results:
> > - Obvious improvements (%5 - 20%) for latency (TCP_RR).
> > - Get a better or minor regression on most of the TX tests, but see
> > some regression on 4096 size.
> > - Except for 8 sessions of 4096 size RX, have a better or same
> > performance.
> > - CPU utilization were incrased as expected.
> >
> > TCP_RR:
> > size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
> > 1/ 1/ +8%/ -32%/ +8%/ +8%/ +7%
> > 1/ 50/ +7%/ -19%/ +7%/ +7%/ +1%
> > 1/ 100/ +5%/ -21%/ +5%/ +5%/ 0%
> > 1/ 200/ +5%/ -21%/ +7%/ +7%/ +1%
> > 64/ 1/ +11%/ -29%/ +11%/ +11%/ +10%
> > 64/ 50/ +7%/ -19%/ +8%/ +8%/ +2%
> > 64/ 100/ +8%/ -18%/ +9%/ +9%/ +2%
> > 64/ 200/ +6%/ -19%/ +6%/ +6%/ 0%
> > 256/ 1/ +7%/ -33%/ +7%/ +7%/ +6%
> > 256/ 50/ +7%/ -18%/ +7%/ +7%/ 0%
> > 256/ 100/ +9%/ -18%/ +8%/ +8%/ +2%
> > 256/ 200/ +9%/ -18%/ +10%/ +10%/ +3%
> > 1024/ 1/ +20%/ -28%/ +20%/ +20%/ +19%
> > 1024/ 50/ +8%/ -18%/ +9%/ +9%/ +2%
> > 1024/ 100/ +6%/ -19%/ +5%/ +5%/ 0%
> > 1024/ 200/ +8%/ -18%/ +9%/ +9%/ +2%
> > Guest TX:
> > size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
> > 64/ 1/ -5%/ -28%/ +11%/ +12%/ +10%
> > 64/ 4/ -2%/ -26%/ +13%/ +13%/ +13%
> > 64/ 8/ -6%/ -29%/ +9%/ +10%/ +10%
> > 512/ 1/ +15%/ -7%/ +13%/ +11%/ +3%
> > 512/ 4/ +17%/ -6%/ +18%/ +13%/ +11%
> > 512/ 8/ +14%/ -7%/ +13%/ +7%/ +7%
> > 1024/ 1/ +27%/ -2%/ +26%/ +29%/ +12%
> > 1024/ 4/ +8%/ -9%/ +6%/ +1%/ +6%
> > 1024/ 8/ +41%/ +12%/ +34%/ +20%/ -3%
> > 4096/ 1/ -22%/ -21%/ -36%/ +81%/+1360%
> > 4096/ 4/ -57%/ -58%/ +286%/ +15%/+2074%
> > 4096/ 8/ +67%/ +70%/ -45%/ -8%/ +63%
> > 16384/ 1/ -2%/ -5%/ +5%/ -3%/ +80%
> > 16384/ 4/ 0%/ 0%/ 0%/ +4%/ +138%
> > 16384/ 8/ 0%/ 0%/ 0%/ +1%/ +41%
> > 65535/ 1/ -3%/ -6%/ +2%/ +11%/ +113%
> > 65535/ 4/ -2%/ -1%/ -2%/ -3%/ +484%
> > 65535/ 8/ 0%/ +1%/ 0%/ +2%/ +40%
> > Guest RX:
> > size/session/+thu%/+normalize%/+tpkts%/+rpkts%/+ioexits%/
> > 64/ 1/ +31%/ -3%/ +8%/ +8%/ +8%
> > 64/ 4/ +11%/ -17%/ +13%/ +14%/ +15%
> > 64/ 8/ +4%/ -23%/ +11%/ +11%/ +12%
> > 512/ 1/ +24%/ 0%/ +18%/ +14%/ -8%
> > 512/ 4/ +4%/ -15%/ +6%/ +5%/ +6%
> > 512/ 8/ +26%/ 0%/ +21%/ +10%/ +3%
> > 1024/ 1/ +88%/ +47%/ +69%/ +44%/ -30%
> > 1024/ 4/ +18%/ -5%/ +19%/ +16%/ +2%
> > 1024/ 8/ +15%/ -4%/ +13%/ +8%/ +1%
> > 4096/ 1/ -3%/ -5%/ +2%/ -2%/ +41%
> > 4096/ 4/ +2%/ +3%/ -20%/ -14%/ -24%
> > 4096/ 8/ -43%/ -45%/ +69%/ -24%/ +94%
> > 16384/ 1/ -3%/ -11%/ +23%/ +7%/ +42%
> > 16384/ 4/ -3%/ -3%/ -4%/ +5%/ +115%
> > 16384/ 8/ -1%/ 0%/ -1%/ -3%/ +32%
> > 65535/ 1/ +1%/ 0%/ +2%/ 0%/ +66%
> > 65535/ 4/ -1%/ -1%/ 0%/ +4%/ +492%
> > 65535/ 8/ 0%/ -1%/ -1%/ +4%/ +38%
> >
> > Changes from V3:
> > - drop single_task_running()
> > - use cpu_relax_lowlatency() instead of cpu_relax()
> >
> > Changes from V2:
> > - rename vhost_vq_more_avail() to vhost_vq_avail_empty(). And return
> > false we __get_user() fails.
> > - do not bother premmptions/timers for good path.
> > - use vhost_vring_state as ioctl parameter instead of reinveting a new
> > one.
> > - add the unit of timeout (us) to the comment of new added ioctls
> >
> > Changes from V1:
> > - remove the buggy vq_error() in vhost_vq_more_avail().
> > - leave vhost_enable_notify() untouched.
> >
> > Changes from RFC V3:
> > - small tweak on the code to avoid multiple duplicate conditions in
> > critical path when busy loop is not enabled.
> > - add the test result of multiple VMs
> >
> > Changes from RFC V2:
> > - poll also at the end of rx handling
> > - factor out the polling logic and optimize the code a little bit
> > - add two ioctls to get and set the busy poll timeout
> > - test on ixgbe (which can give more stable and reproducable numbers)
> > instead of mlx4.
> >
> > Changes from RFC V1:
> > - add a comment for vhost_has_work() to explain why it could be
> > lockless
> > - add param description for busyloop_timeout
> > - split out the busy polling logic into a new helper
> > - check and exit the loop when there's a pending signal
> > - disable preemption during busy looping to make sure lock_clock() was
> > correctly used.
> >
> > Jason Wang (3):
> > vhost: introduce vhost_has_work()
> > vhost: introduce vhost_vq_avail_empty()
> > vhost_net: basic polling support
> >
> > drivers/vhost/net.c | 78
+++++++++++++++++++++++++++++++++++++++++++---
> > drivers/vhost/vhost.c | 35 +++++++++++++++++++++
> > drivers/vhost/vhost.h | 3 ++
> > include/uapi/linux/vhost.h | 6 ++++
> > 4 files changed, 117 insertions(+), 5 deletions(-)
> >
>