Re: [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

From: Jason Wang
Date: Mon Dec 24 2018 - 03:44:27 EST

On 2018/12/17 äå3:57, Michael S. Tsirkin wrote:
On Sat, Dec 15, 2018 at 11:43:08AM -0800, David Miller wrote:
From: Jason Wang <jasowang@xxxxxxxxxx>
Date: Fri, 14 Dec 2018 12:29:54 +0800

On 2018/12/14 äå4:12, Michael S. Tsirkin wrote:
On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:

This series tries to access virtqueue metadata through kernel virtual
address instead of copy_user() friends since they had too much
overheads like checks, spec barriers or even hardware feature

Test shows about 24% improvement on TX PPS. It should benefit other
cases as well.

Please review
I think the idea of speeding up userspace access is a good one.
However I think that moving all checks to start is way too aggressive.

So did packet and AF_XDP. Anyway, sharing address space and access
them directly is the fastest way. Performance is the major
consideration for people to choose backend. Compare to userspace
implementation, vhost does not have security advantages at any
level. If vhost is still slow, people will start to develop backends
based on e.g AF_XDP.
Exactly, this is precisely how this kind of problem should be solved.

Michael, I strongly support the approach Jason is taking here, and I
would like to ask you to seriously reconsider your objections.

Thank you.
Okay. Won't be the first time I'm wrong.

Let's say we ignore security aspects, but we need to make sure the
following all keep working (broken with this revision):
- file backed memory (I didn't see where we mark memory dirty -
if we don't we get guest memory corruption on close, if we do
then host crash as seems to apply here?)

We only pin metadata pages, so I don't think they can be used for DMA. So it was probably not an issue. The real issue is zerocopy codes, maybe it's time to disable it by default?


We will miss 2 or 4 pages for THP, I wonder whether or not it's measurable.

- auto-NUMA

I'm not sure auto-NUMA will help for the case of IPC. It can damage the performance in the worst case if vhost and userspace are running in two different nodes. Anyway I can measure.

Because vhost isn't like AF_XDP where you can just tell people "use
hugetlbfs" and "data is removed on close" - people are using it in lots
of configurations with guest memory shared between rings and unrelated

This series doesn't share data, only metadata is shared.

Jason, thoughts on these?

Based on the above, I can measure the impact of THP to see how it impacts.

For unsafe variants, it can only work for when we can batch the access and it needs non trivial rework on the vhost codes with unexpected amount of work for archs other than x86. I'm not sure it's worth to try.