On Wed, Jul 24, 2019 at 06:08:05PM +0800, Jason Wang wrote:
On 2019/7/24 äå4:05, Michael S. Tsirkin wrote:OK I think you are right. Sorry it took me a while to figure out.
On Wed, Jul 24, 2019 at 10:17:14AM +0800, Jason Wang wrote:
On 2019/7/23 äå11:02, Michael S. Tsirkin wrote:I'm sorry I just do not get the argument.
On Tue, Jul 23, 2019 at 09:34:29PM +0800, Jason Wang wrote:So in invalidate_end() callback we have:
On 2019/7/23 äå6:27, Michael S. Tsirkin wrote:So what orders __get_user_pages_fast wrt invalidate_count read?
In vhost_map_prefetch() we do:Yes, since there could be multiple co-current invalidation requests. We needI don't think this helps at all.
count them to make sure we don't pin wrong pages.
I also wonder about ordering. kvm has this:I'm not familiar with kvm MMU internals, but we do everything under of
/*
* Used to check for invalidations in progress, of the pfn that is
* returned by pfn_to_pfn_prot below.
*/
mmu_seq = kvm->mmu_notifier_seq;
/*
* Ensure the read of mmu_notifier_seq isn't reordered with PTE reads in
* gfn_to_pfn_prot() (which calls get_user_pages()), so that we don't
* risk the page we get a reference to getting unmapped before we have a
* chance to grab the mmu_lock without mmu_notifier_retry() noticing.
*
* This smp_rmb() pairs with the effective smp_wmb() of the combination
* of the pte_unmap_unlock() after the PTE is zapped, and the
* spin_lock() in kvm_mmu_notifier_invalidate_<page|range_end>() before
* mmu_notifier_seq is incremented.
*/
smp_rmb();
does this apply to us? Can't we use a seqlock instead so we do
not need to worry?
mmu_lock.
Thanks
There's no lock between checking the invalidate counter and
get user pages fast within vhost_map_prefetch. So it's possible
that get user pages fast reads PTEs speculatively before
invalidate is read.
--
ÂÂÂÂÂÂÂ spin_lock(&vq->mmu_lock);
    ...
ÂÂÂÂÂÂÂ err = -EFAULT;
ÂÂÂÂÂÂÂ if (vq->invalidate_count)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ goto err;
ÂÂÂÂÂÂÂ ...
ÂÂÂÂÂÂÂ npinned = __get_user_pages_fast(uaddr->uaddr, npages,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ uaddr->write, pages);
ÂÂÂÂÂÂÂ ...
ÂÂÂ ÂÂÂ spin_unlock(&vq->mmu_lock);
Is this not sufficient?
Thanks
spin_lock(&vq->mmu_lock);
--vq->invalidate_count;
ÂÂÂÂÂÂÂ spin_unlock(&vq->mmu_lock);
So even PTE is read speculatively before reading invalidate_count (only in
the case of invalidate_count is zero). The spinlock has guaranteed that we
won't read any stale PTEs.
Thanks
If you want to order two reads you need an smp_rmb
or stronger between them executed on the same CPU.
Executing any kind of barrier on another CPU
will have no ordering effect on the 1st one.
So if CPU1 runs the prefetch, and CPU2 runs invalidate
callback, read of invalidate counter on CPU1 can bypass
read of PTE on CPU1 unless there's a barrier
in between, and nothing CPU2 does can affect that outcome.
What did I miss?
It doesn't harm if PTE is read before invalidate_count, this is because:
1) This speculation is serialized with invalidate_range_end() because of the
spinlock
2) This speculation can only make effect when we read invalidate_count as
zero.
3) This means the speculation is done after the last invalidate_range_end()
and because of the spinlock, when we enter the critical section of spinlock
in prefetch, we can not see any stale PTE that was unmapped before.
Am I wrong?
Thanks