Re: [PATCH] KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap_page()

From: Lan Tianyu
Date: Tue Mar 08 2016 - 03:52:55 EST

On 2016å03æ04æ 16:49, Paolo Bonzini wrote:
> On 04/03/2016 08:12, Lan Tianyu wrote:
>>>> /*
>>>> - * wmb: make sure everyone sees our modifications to the page tables
>>>> - * rmb: make sure we see changes to vcpu->mode
>>> You want to leave the comment explaining the memory barriers and tell that
>>> kvm_flush_remote_tlbs() contains the smp_mb().
>> That sounds more reasonable. Will update. Thanks.
> In fact, the reason for kvm_flush_remote_tlbs()'s barrier is exactly
> what was in this comment.

Hi Paolo:
Thanks for your comments.

Summary about smp_mb()s we met in this thread. If misunderstood, please
correct me. Thanks.

The smp_mb() in the kvm_flush_remote_tlbs() was introduced by the commit
a4ee1ca4 and it seems to keep the order of reading and cmpxchg

commit a4ee1ca4a36e7857d90ae8c2b85f1bde9a042c10
Author: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxx>
Date: Tue Nov 23 11:13:00 2010 +0800

KVM: MMU: delay flush all tlbs on sync_page path

Quote from Avi:
| I don't think we need to flush immediately; set a "tlb dirty" bit
| that is cleareded when we flush the tlb.
| can consult the bit and force a flush if set.

Signed-off-by: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxx>
Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>

The smp_mb() in the kvm_mmu_commit_zap_page() was introduced by commit
c142786c6 which was merged later than commit a4ee1ca4. It pairs with
smp_mb() in the walk_shadow_page_lockless_begin/end() to keep order
between modifications of the page tables and reading mode.

commit c142786c6291189b5c85f53d91743e1eefbd8fe0
Author: Avi Kivity <avi@xxxxxxxxxx>
Date: Mon May 14 15:44:06 2012 +0300

KVM: MMU: Don't use RCU for lockless shadow walking

Using RCU for lockless shadow walking can increase the amount of memory
in use by the system, since RCU grace periods are unpredictable. We
have an unconditional write to a shared variable (reader_counter), which
isn't good for scaling.

Replace that with a scheme similar to x86's get_user_pages_fast():
interrupts during lockless shadow walk to force the freer
(kvm_mmu_commit_zap_page()) to wait for the TLB flush IPI to find the
processor with interrupts enabled.

We also add a new vcpu->mode, READING_SHADOW_PAGE_TABLES, to prevent
kvm_flush_remote_tlbs() from avoiding the IPI.

Signed-off-by: Avi Kivity <avi@xxxxxxxxxx>
Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>

The smp_mb() in the kvm_make_all_cpus_request() was introduced by commit
6b7e2d09. It keeps order between setting request bit and reading mode.

commit 6b7e2d0991489559a1df4500d77f7b76c4607ed0
Author: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxx>
Date: Wed Jan 12 15:40:31 2011 +0800

KVM: Add "exiting guest mode" state

Currently we keep track of only two states: guest mode and host
mode. This patch adds an "exiting guest mode" state that tells
us that an IPI will happen soon, so unless we need to wait for the
IPI, we can avoid it completely.

1: No need atomically to read/write ->mode in vcpu's thread

2: reorganize struct kvm_vcpu to make ->mode and ->requests
in the same cache line explicitly

Signed-off-by: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxx>
Signed-off-by: Avi Kivity <avi@xxxxxxxxxx>

> So you can:
> 1) add a comment to kvm_flush_remote_tlbs like:
> /*
> * We want to publish modifications to the page tables before reading
> * mode. Pairs with a memory barrier in arch-specific code.
> * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest.
> * - powerpc: smp_mb in kvmppc_prepare_to_enter.
> */
> 2) add a comment to vcpu_enter_guest and kvmppc_prepare_to_enter, saying
> that the memory barrier also orders the write to mode from any reads
> to the page tables done while the VCPU is running. In other words, on
> entry a single memory barrier achieves two purposes (write ->mode before
> reading requests, write ->mode before reading page tables).

These sounds good.

> The same should be true in kvm_flush_remote_tlbs(). So you may investigate
> removing the barrier from kvm_flush_remote_tlbs, because
> kvm_make_all_cpus_request already includes a memory barrier. Like
> Thomas suggested, leave a comment in kvm_flush_remote_tlbs(),
> saying which memory barrier you are relying on and for what.

If we remove the smp_mb() in the kvm_flush_remote_tlbs(), we need to
leave comments both in the kvm_flush_remote_tlbs() and
kvm_mmu_commit_zap_page(), right?

> And finally, the memory barrier in kvm_make_all_cpus_request can become
> smp_mb__after_atomic, which is free on x86.

I found you have done this in your tree :)

commit 5b06b60b72b73cbe3805d2a64d223e0264bd0479
Author: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Date: Wed Feb 24 12:44:17 2016 +0100

KVM: mark memory barrier with smp_mb__after_atomic

Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>

> Of course, all this should be done in at least three separate patches.
> Thanks!
> Paolo

Best regards
Tianyu Lan