Re: [PATCH 11/11] KVM: MMU: improve write flooding detected

From: Xiao Guangrong
Date: Tue Aug 23 2011 - 06:53:52 EST


Hi Marcelo,

On 08/23/2011 04:00 PM, Marcelo Tosatti wrote:
> On Tue, Aug 16, 2011 at 02:46:47PM +0800, Xiao Guangrong wrote:
>> Detecting write-flooding does not work well, when we handle page written, if
>> the last speculative spte is not accessed, we treat the page is
>> write-flooding, however, we can speculative spte on many path, such as pte
>> prefetch, page synced, that means the last speculative spte may be not point
>> to the written page and the written page can be accessed via other sptes, so
>> depends on the Accessed bit of the last speculative spte is not enough
>
> Yes, a stale last_speculative_spte is possible, but is this fact a
> noticeable problem in practice?
>
> Was this detected by code inspection?
>

I detected this because: i noticed some shadow page is zapped by
write-flooding but it is accessed soon, it causes the shadow page zapped
and alloced again and again(very frequently).

Another reason is that: in current code, write-flooding is little complex
and it stuffs code in many places, actually, write-flooding is only needed for
shadow page/nested guest, so i want to simplify it and wrap its code up.

>> - }
>> + if (spte && !(*spte & shadow_accessed_mask))
>> + sp->write_flooding_count++;
>> + else
>> + sp->write_flooding_count = 0;
>
> This relies on the sptes being created by speculative means
> or by pressure on the host clearing the accessed bit for the
> shadow page to be zapped.
>
> There is no guarantee that either of these is true for a given
> spte.
>
> And if the sptes do not have accessed bit set, any nonconsecutive 3 pte
> updates will zap the page.
>

Please note we clear 'sp->write_flooding_count' when it is accessed from
shadow page cache (in kvm_mmu_get_page), it means if any spte of sp generates
#PF, the fooding count can be reset.

And, i think there are not problems since: if the spte without accssed bit is
written frequently, it means the guest page table is accessed infrequently or
during the writing, the guest page table is not accessed, in this time, zapping
this shadow page is not bad.

Comparing the old way, the advantage of it is good for zapping upper shadow page,
for example, in the old way:
if a gfn is used as PDE for a task, later, the gfn is freed and used as PTE for
the new task, so we have two shadow pages in the host, one sp1.level = 2 and the
other sp2.level = 1. So, when we detect write-flooding, the vcpu->last_pte_updated
always point to sp2.pte. As sp2 is used for the new task, we always detected both
shadow pages are bing used, but actually, sp1 is not used by guest anymore.

> Back to the first question, what is the motivation for this heuristic
> change? Do you have any numbers?
>

Yes, i have done the quick test:

before this patch:
2m56.561
2m50.651
2m51.220
2m52.199
2m48.066

After this patch:
2m51.194
2m55.980
2m50.755
2m47.396
2m46.807

It shows the new way is little better than the old way.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/