Re: [PATCH V2 7/9] vhost: do not use RCU to synchronize MMU notifier with worker

From: Jason Wang
Date: Mon Aug 05 2019 - 00:20:55 EST



On 2019/8/2 äå8:46, Jason Gunthorpe wrote:
On Fri, Aug 02, 2019 at 05:40:07PM +0800, Jason Wang wrote:
This must be a proper barrier, like a spinlock, mutex, or
synchronize_rcu.

I start with synchronize_rcu() but both you and Michael raise some
concern.
I've also idly wondered if calling synchronize_rcu() under the various
mm locks is a deadlock situation.


Maybe, that's why I suggest to use vhost_work_flush() which is much lightweight can can achieve the same function. It can guarantee all previous work has been processed after vhost_work_flush() return.



Then I try spinlock and mutex:

1) spinlock: add lots of overhead on datapath, this leads 0 performance
improvement.
I think the topic here is correctness not performance improvement


But the whole series is to speed up vhost.



2) SRCU: full memory barrier requires on srcu_read_lock(), which still leads
little performance improvement
3) mutex: a possible issue is need to wait for the page to be swapped in (is
this unacceptable ?), another issue is that we need hold vq lock during
range overlap check.
I have a feeling that mmu notififers cannot safely become dependent on
progress of swap without causing deadlock. You probably should avoid
this.


Yes, so that's why I try to synchronize the critical region by myself.


And, again, you can't re-invent a spinlock with open coding and get
something better.
So the question is if waiting for swap is considered to be unsuitable for
MMU notifiers. If not, it would simplify codes. If not, we still need to
figure out a possible solution.

Btw, I come up another idea, that is to disable preemption when vhost thread
need to access the memory. Then register preempt notifier and if vhost
thread is preempted, we're sure no one will access the memory and can do the
cleanup.
I think you should use the spinlock so at least the code is obviously
functionally correct and worry about designing some properly justified
performance change after.

Jason


Spinlock is correct but make the whole series meaningless consider it won't bring any performance improvement.

Thanks