Re: [PATCH v7 04/11] KVM: MMU: zap pages in batch

From: Xiao Guangrong
Date: Wed May 29 2013 - 12:04:11 EST


On 05/29/2013 10:02 PM, Xiao Guangrong wrote:
> On 05/29/2013 09:32 PM, Marcelo Tosatti wrote:
>> On Wed, May 29, 2013 at 09:09:09PM +0800, Xiao Guangrong wrote:
>>> This information is I replied Gleb in his mail where he raced a question that
>>> why "collapse tlb flush is needed":
>>>
>>> ======
>>> It seems no.
>>> Since we have reloaded mmu before zapping the obsolete pages, the mmu-lock
>>> is easily contended. I did the simple track:
>>>
>>> + int num = 0;
>>> restart:
>>> list_for_each_entry_safe_reverse(sp, node,
>>> &kvm->arch.active_mmu_pages, link) {
>>> @@ -4265,6 +4265,7 @@ restart:
>>> if (batch >= BATCH_ZAP_PAGES &&
>>> cond_resched_lock(&kvm->mmu_lock)) {
>>> batch = 0;
>>> + num++;
>>> goto restart;
>>> }
>>>
>>> @@ -4277,6 +4278,7 @@ restart:
>>> * may use the pages.
>>> */
>>> kvm_mmu_commit_zap_page(kvm, &invalid_list);
>>> + printk("lock-break: %d.\n", num);
>>> }
>>>
>>> I do read pci rom when doing kernel building in the guest which
>>> has 1G memory and 4vcpus with ept enabled, this is the normal
>>> workload and normal configuration.
>>>
>>> # dmesg
>>> [ 2338.759099] lock-break: 8.
>>> [ 2339.732442] lock-break: 5.
>>> [ 2340.904446] lock-break: 3.
>>> [ 2342.513514] lock-break: 3.
>>> [ 2343.452229] lock-break: 3.
>>> [ 2344.981599] lock-break: 4.
>>>
>>> Basically, we need to break many times.
>>
>> Should measure kvm_mmu_zap_all latency.
>>
>>> ======
>>>
>>> You can see we should break 3 times to zap all pages even if we have zapoed
>>> 10 pages in batch. It is obviously that it need break more times without
>>> batch-zapping.
>>
>> Again, breaking should be no problem, what matters is latency. Please
>> measure kvm_mmu_zap_all latency after all optimizations to justify
>> this minimum batching.
>
> Okay, okay. I will benchmark the latency.

Okay, I have done the test, the test environment is the same that
"I do read pci rom when doing kernel building in the guest which
has 1G memory and 4vcpus with ept enabled, this is the normal
workload and normal configuration.".

Batch-zapped:
Guest:
# cat /sys/bus/pci/devices/0000\:00\:03.0/rom
# free -m
total used free shared buffers cached
Mem: 975 793 181 0 6 438
-/+ buffers/cache: 347 627
Swap: 2015 43 1972

Host shows:
[ 2229.918558] lock-break: 5.
[ 2229.918564] kvm_mmu_invalidate_zap_all_pages: 174706e.


No-batch:
Guest:
# cat /sys/bus/pci/devices/0000\:00\:03.0/rom
# free -m
total used free shared buffers cached
Mem: 975 843 131 0 17 476
-/+ buffers/cache: 348 626
Swap: 2015 2

Host shows:
[ 2931.675285] lock-break: 13.
[ 2931.675291] kvm_mmu_invalidate_zap_all_pages: 69c1676.

That means, nearly the same memory accessed on guest:
- batch-zapped need to break 5 times, the latency is 174706e.
- no-batch need to break 13 times, the latency is 69c1676.

The code change to track the latency:

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 055d675..a66f21b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4233,13 +4233,13 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
spin_unlock(&kvm->mmu_lock);
}

-#define BATCH_ZAP_PAGES 10
+#define BATCH_ZAP_PAGES 0
static void kvm_zap_obsolete_pages(struct kvm *kvm)
{
struct kvm_mmu_page *sp, *node;
LIST_HEAD(invalid_list);
int batch = 0;
-
+ int num = 0;
restart:
list_for_each_entry_safe_reverse(sp, node,
&kvm->arch.active_mmu_pages, link) {
@@ -4265,6 +4265,7 @@ restart:
if (batch >= BATCH_ZAP_PAGES &&
cond_resched_lock(&kvm->mmu_lock)) {
batch = 0;
+ num++;
goto restart;
}

@@ -4277,6 +4278,7 @@ restart:
* may use the pages.
*/
kvm_mmu_commit_zap_page(kvm, &invalid_list);
+ printk("lock-break: %d.\n", num);
}

/*
@@ -4290,7 +4292,12 @@ restart:
*/
void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm)
{
+ u64 start;
+
spin_lock(&kvm->mmu_lock);
+
+ start = local_clock();
+
trace_kvm_mmu_invalidate_zap_all_pages(kvm);
kvm->arch.mmu_valid_gen++;

@@ -4306,6 +4313,9 @@ void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm)
kvm_reload_remote_mmus(kvm);

kvm_zap_obsolete_pages(kvm);
+
+ printk("%s: %llx.\n", __FUNCTION__, local_clock() - start);
+
spin_unlock(&kvm->mmu_lock);
}



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/