Re: [PATCH v4 4/6] KVM: MMU: fast invalid all shadow pages
From: Xiao Guangrong
Date: Mon May 06 2013 - 23:40:15 EST
On 05/07/2013 03:50 AM, Marcelo Tosatti wrote:
> On Mon, May 06, 2013 at 11:39:11AM +0800, Xiao Guangrong wrote:
>> On 05/04/2013 08:52 AM, Marcelo Tosatti wrote:
>>> On Sat, May 04, 2013 at 12:51:06AM +0800, Xiao Guangrong wrote:
>>>> On 05/03/2013 11:53 PM, Marcelo Tosatti wrote:
>>>>> On Fri, May 03, 2013 at 01:52:07PM +0800, Xiao Guangrong wrote:
>>>>>> On 05/03/2013 09:05 AM, Marcelo Tosatti wrote:
>>>>>>
>>>>>>>> +
>>>>>>>> +/*
>>>>>>>> + * Fast invalid all shadow pages belong to @slot.
>>>>>>>> + *
>>>>>>>> + * @slot != NULL means the invalidation is caused the memslot specified
>>>>>>>> + * by @slot is being deleted, in this case, we should ensure that rmap
>>>>>>>> + * and lpage-info of the @slot can not be used after calling the function.
>>>>>>>> + *
>>>>>>>> + * @slot == NULL means the invalidation due to other reasons, we need
>>>>>>>> + * not care rmap and lpage-info since they are still valid after calling
>>>>>>>> + * the function.
>>>>>>>> + */
>>>>>>>> +void kvm_mmu_invalid_memslot_pages(struct kvm *kvm,
>>>>>>>> + struct kvm_memory_slot *slot)
>>>>>>>> +{
>>>>>>>> + spin_lock(&kvm->mmu_lock);
>>>>>>>> + kvm->arch.mmu_valid_gen++;
>>>>>>>> +
>>>>>>>> + /*
>>>>>>>> + * All shadow paes are invalid, reset the large page info,
>>>>>>>> + * then we can safely desotry the memslot, it is also good
>>>>>>>> + * for large page used.
>>>>>>>> + */
>>>>>>>> + kvm_clear_all_lpage_info(kvm);
>>>>>>>
>>>>>>> Xiao,
>>>>>>>
>>>>>>> I understood it was agreed that simple mmu_lock lockbreak while
>>>>>>> avoiding zapping of newly instantiated pages upon a
>>>>>>>
>>>>>>> if(spin_needbreak)
>>>>>>> cond_resched_lock()
>>>>>>>
>>>>>>> cycle was enough as a first step? And then later introduce root zapping
>>>>>>> along with measurements.
>>>>>>>
>>>>>>> https://lkml.org/lkml/2013/4/22/544
>>>>>>
>>>>>> Yes, it is.
>>>>>>
>>>>>> See the changelog in 0/0:
>>>>>>
>>>>>> " we use lock-break technique to zap all sptes linked on the
>>>>>> invalid rmap, it is not very effective but good for the first step."
>>>>>>
>>>>>> Thanks!
>>>>>
>>>>> Sure, but what is up with zeroing kvm_clear_all_lpage_info(kvm) and
>>>>> zapping the root? Only lock-break technique along with generation number
>>>>> was what was agreed.
>>>>
>>>> Marcelo,
>>>>
>>>> Please Wait... I am completely confused. :(
>>>>
>>>> Let's clarify "zeroing kvm_clear_all_lpage_info(kvm) and zapping the root" first.
>>>> Are these changes you wanted?
>>>>
>>>> void kvm_mmu_invalid_memslot_pages(struct kvm *kvm,
>>>> struct kvm_memory_slot *slot)
>>>> {
>>>> spin_lock(&kvm->mmu_lock);
>>>> kvm->arch.mmu_valid_gen++;
>>>>
>>>> /* Zero all root pages.*/
>>>> restart:
>>>> list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
>>>> if (!sp->root_count)
>>>> continue;
>>>>
>>>> if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
>>>> goto restart;
>>>> }
>>>>
>>>> /*
>>>> * All shadow paes are invalid, reset the large page info,
>>>> * then we can safely desotry the memslot, it is also good
>>>> * for large page used.
>>>> */
>>>> kvm_clear_all_lpage_info(kvm);
>>>>
>>>> kvm_mmu_commit_zap_page(kvm, &invalid_list);
>>>> spin_unlock(&kvm->mmu_lock);
>>>> }
>>>>
>>>> static void rmap_remove(struct kvm *kvm, u64 *spte)
>>>> {
>>>> struct kvm_mmu_page *sp;
>>>> gfn_t gfn;
>>>> unsigned long *rmapp;
>>>>
>>>> sp = page_header(__pa(spte));
>>>> +
>>>> + /* Let invalid sp do not access its rmap. */
>>>> + if (!sp_is_valid(sp))
>>>> + return;
>>>> +
>>>> gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt);
>>>> rmapp = gfn_to_rmap(kvm, gfn, sp->role.level);
>>>> pte_list_remove(spte, rmapp);
>>>> }
>>>>
>>>> If yes, there is the reason why we can not do this that i mentioned before:
>>>>
>>>> after call kvm_mmu_invalid_memslot_pages(), the memslot->rmap will be destroyed.
>>>> Later, if host reclaim page, the mmu-notify handlers, ->invalidate_page and
>>>> ->invalidate_range_start, can not find any spte using the host page, then
>>>> Accessed/Dirty for host page is missing tracked.
>>>> (missing call kvm_set_pfn_accessed and kvm_set_pfn_dirty properly.)
>>>>
>>>> What's your idea?
>>>
>>>
>>> Step 1) Fix kvm_mmu_zap_all's behaviour: introduce lockbreak via
>>> spin_needbreak. Use generation numbers so that in case kvm_mmu_zap_all
>>> releases mmu_lock and reacquires it again, only shadow pages
>>> from the generation with which kvm_mmu_zap_all started are zapped (this
>>> guarantees forward progress and eventual termination).
>>>
>>> kvm_mmu_zap_generation()
>>> spin_lock(mmu_lock)
>>> int generation = kvm->arch.mmu_generation;
>>>
>>> for_each_shadow_page(sp) {
>>> if (sp->generation == kvm->arch.mmu_generation)
>>> zap_page(sp)
>>> if (spin_needbreak(mmu_lock)) {
>>> kvm->arch.mmu_generation++;
>>> cond_resched_lock(mmu_lock);
>>> }
>>> }
>>>
>>> kvm_mmu_zap_all()
>>> spin_lock(mmu_lock)
>>> for_each_shadow_page(sp) {
>>> if (spin_needbreak(mmu_lock)) {
>>> cond_resched_lock(mmu_lock);
>>> }
>>> }
>>>
>>> Use kvm_mmu_zap_generation for kvm_arch_flush_shadow_memslot.
>>> Use kvm_mmu_zap_all for kvm_mmu_notifier_release,kvm_destroy_vm.
>>>
>>> This addresses the main problem: excessively long hold times
>>> of kvm_mmu_zap_all with very large guests.
>>>
>>> Do you see any problem with this logic? This was what i was thinking
>>> we agreed.
>>
>> No. I understand it and it can work.
>>
>> Actually, it is similar with Gleb's idea that "zapping stale shadow pages
>> (and uses lock break technique)", after some discussion, we thought "only zap
>> shadow pages that are reachable from the slot's rmap" is better, that is this
>> patchset does.
>> (https://lkml.org/lkml/2013/4/23/73)
>>
>>>
>>> Step 2) Show that the optimization to zap only the roots is worthwhile
>>> via benchmarking, and implement it.
>>
>> This is what i am confused. I can not understand how "zap only the roots"
>> works. You mean these change?
>>
>> kvm_mmu_zap_generation()
>> spin_lock(mmu_lock)
>> int generation = kvm->arch.mmu_generation;
>>
>> for_each_shadow_page(sp) {
>> /* Change here. */
>> => if ((sp->generation == kvm->arch.mmu_generation) &&
>> => sp->root_count)
>> zap_page(sp)
>>
>> if (spin_needbreak(mmu_lock)) {
>> kvm->arch.mmu_generation++;
>> cond_resched_lock(mmu_lock);
>> }
>> }
>>
>> If we do this, there will have shadow pages that are linked to invalid memslot's
>> rmap. How do we handle these pages and the mmu-notify issue?
>>
>> Thanks!
>
> By "zap only roots" i mean zapping roots plus generation number on
> shadow pages. But this as a second step, after it has been demonstrated
> its worthwhile.
Marcelo,
Sorry for my stupidity, still do not understand. Could you please show me the
pseudocode and answer my questions above?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/