Re: [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

From: Jason Wang
Date: Thu Dec 13 2018 - 23:30:10 EST

On 2018/12/14 äå4:12, Michael S. Tsirkin wrote:
On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:

This series tries to access virtqueue metadata through kernel virtual
address instead of copy_user() friends since they had too much
overheads like checks, spec barriers or even hardware feature

Test shows about 24% improvement on TX PPS. It should benefit other
cases as well.

Please review
I think the idea of speeding up userspace access is a good one.
However I think that moving all checks to start is way too aggressive.

So did packet and AF_XDP. Anyway, sharing address space and access them directly is the fastest way. Performance is the major consideration for people to choose backend. Compare to userspace implementation, vhost does not have security advantages at any level. If vhost is still slow, people will start to develop backends based on e.g AF_XDP.

Instead, let's batch things up but let's not keep them
around forever.
Here are some ideas:

1. Disable preemption, process a small number of small packets
directly in an atomic context. This should cut latency
down significantly, the tricky part is to only do it
on a light load and disable this
for the streaming case otherwise it's unfair.
This might fail, if it does just bounce things out to
a thread.

I'm not sure what context you meant here. Is this for TX path of TUN? But a fundamental difference is my series is targeted for extreme heavy load not light one, 100% cpu for vhost is expected.

2. Switch to unsafe_put_user/unsafe_get_user,
and batch up multiple accesses.

As I said, unless we can batch accessing of two difference places of three of avail, descriptor and used. It won't help for batching the accessing of a single place like used. I'm even not sure this can be done consider the case of packed virtqueue, we have a single descriptor ring. Batching through unsafe helpers may not help in this case since it's equivalent to safe ones . And This requires non trivial refactoring of vhost. And such refactoring itself make give us noticeable impact (e.g it may lead regression).

3. Allow adding a fixup point manually,
such that multiple independent get_user accesses
can get a single fixup (will allow better compiler

So for metadata access, I don't see how you suggest here can help in the case of heavy workload.

For data access, this may help but I've played to batch the data copy to reduce SMAP/spec barriers in vhost-net but I don't see performance improvement.


Jason Wang (3):
vhost: generalize adding used elem
vhost: fine grain userspace memory accessors
vhost: access vq metadata through kernel virtual address

drivers/vhost/vhost.c | 281 ++++++++++++++++++++++++++++++++++++++----
drivers/vhost/vhost.h | 11 ++
2 files changed, 266 insertions(+), 26 deletions(-)