On Wed, Jul 31, 2019 at 09:28:20PM +0800, Jason Wang wrote:
On 2019/7/31 äå8:39, Jason Gunthorpe wrote:The seqlock is usually used to prevent a 2nd thread from accessing the
On Wed, Jul 31, 2019 at 04:46:53AM -0400, Jason Wang wrote:
We used to use RCU to synchronize MMU notifier with worker. This leadsYou just described a seqlock.
calling synchronize_rcu() in invalidate_range_start(). But on a busy
system, there would be many factors that may slow down the
synchronize_rcu() which makes it unsuitable to be called in MMU
notifier.
A solution is SRCU but its overhead is obvious with the expensive full
memory barrier. Another choice is to use seqlock, but it doesn't
provide a synchronization method between readers and writers. The last
choice is to use vq mutex, but it need to deal with the worst case
that MMU notifier must be blocked and wait for the finish of swap in.
So this patch switches use a counter to track whether or not the map
was used. The counter was increased when vq try to start or finish
uses the map. This means, when it was even, we're sure there's no
readers and MMU notifier is synchronized. When it was odd, it means
there's a reader we need to wait it to be even again then we are
synchronized.
Kind of, see my explanation below.
We've been talking about providing this as some core service from mmu
notifiers because nearly every use of this API needs it.
That would be very helpful.
IMHO this gets the whole thing backwards, the common pattern is to
protect the 'shadow pte' data with a seqlock (usually open coded),
such that the mmu notififer side has the write side of that lock and
the read side is consumed by the thread accessing or updating the SPTE.
Yes, I've considered something like that. But the problem is, mmu notifier
(writer) need to wait for the vhost worker to finish the read before it can
do things like setting dirty pages and unmapping page. It looks to me
seqlock doesn't provide things like this.
VA while it is being changed by the mm. ie you use something seqlocky
instead of the ugly mmu_notifier_unregister/register cycle.
You are supposed to use something simple like a spinlock or mutex
inside the invalidate_range_start to serialized tear down of the SPTEs
with their accessors.
write_seqcount_begin()That is because it is a write side lock, not a read lock. IIRC
map = vq->map[X]
write or read through map->addr directly
write_seqcount_end()
There's no rmb() in write_seqcount_begin(), so map could be read before
write_seqcount_begin(), but it looks to me now that this doesn't harm at
all, maybe we can try this way.
seqlocks have weaker barriers because the write side needs to be
serialized in some other way.
The requirement I see is you need invalidate_range_start to block
until another thread exits its critical section (ie stops accessing
the SPTEs).
That is a spinlock/mutex.
You just can't invent a faster spinlock by open coding something with
barriers, it doesn't work.
Jason