On Fri, Dec 14, 2018 at 11:42:18AM +0800, Jason Wang wrote:
On 2018/12/13 äå11:27, Michael S. Tsirkin wrote:I mean the next barrier people decide to put into userspace
On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:
Hi:Userspace accesses through remapping tricks and next time there's a need
This series tries to access virtqueue metadata through kernel virtual
address instead of copy_user() friends since they had too much
overheads like checks, spec barriers or even hardware feature
toggling.
for a new barrier we are left to figure it out by ourselves.
I don't get here, do you mean spec barriers?
memory accesses.
It's completely unnecessary forIt's defence in depth. Take a look at the commit that added them.
vhost which is kernel thread.
And yes quite possibly in most cases we actually have a spec
barrier in the validation phase. If we do let's use the
unsafe variants so they can be found.
And even if you're right, vhost is not theFor sure. But if one can get by without get user pages, one
only place, there's lots of vmap() based accessing in kernel.
really should. Witness recently uncovered mess with file
backed storage.
Think inI personally think we should err on the side of caution not on the side of
another direction, this means we won't suffer form unnecessary barriers for
kthread like vhost in the future, we will manually pick the one we really
need
performance.
(but it should have little possibility).History seems to teach otherwise.
Please notice we only access metdata through remapping not the data itself.I think their justification for the higher risk is that they are mostly
This idea has been used for high speed userspace backend for years, e.g
packet socket or recent AF_XDP.
designed for priveledged userspace.
The only difference is the page was remap toAt least that avoids the g.u.p mess.
from kernel to userspace.
Is there an issue on other architectures? If yes they can be extendedI don't
like the idea I have to say. As a first step, why don't we switch to
unsafe_put_user/unsafe_get_user etc?
Several reasons:
- They only have x86 variant, it won't have any difference for the rest of
architecture.
there.
- unsafe_put_user/unsafe_get_user is not sufficient for accessing structuresSo you want unsafe_copy_xxx_user? I can do this. Hang on will post.
(e.g accessing descriptor) or arrays (batching).
- Unless we can batch at least the accessing of two places in three ofSo let's batch them all?
avail, used and descriptor in one run. There will be no difference. E.g we
can batch updating used ring, but it won't make any difference in this case.
We need to speed up the packet access itself too though.That would be more of an apples to apples comparison, would it not?
Apples to apples comparison only help if we are the No.1. But the fact is we
are not. If we want to compete with e.g dpdk or AF_XDP, vmap() is the
fastest method AFAIK.
Thanks
You can't vmap all of guest memory.
Test shows about 24% improvement on TX PPS. It should benefit other
cases as well.
Please review
Jason Wang (3):
vhost: generalize adding used elem
vhost: fine grain userspace memory accessors
vhost: access vq metadata through kernel virtual address
drivers/vhost/vhost.c | 281 ++++++++++++++++++++++++++++++++++++++----
drivers/vhost/vhost.h | 11 ++
2 files changed, 266 insertions(+), 26 deletions(-)
--
2.17.1