Re: [PATCH 04/12] KVM: MMU: log dirty page after marking spte writable

From: Xiao Guangrong
Date: Wed Nov 20 2013 - 23:27:16 EST



On Nov 21, 2013, at 3:47 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:

> On Wed, Nov 20, 2013 at 10:20:09PM +0800, Xiao Guangrong wrote:
>>> But what guarantee does userspace require, from GET_DIRTY_LOG, while vcpus are
>>> executing?
>>
>> Aha. Single calling GET_DIRTY_LOG is useless since new dirty page can be generated
>> when GET_DIRTY_LOG is being returned. If user wants to get exact dirty pages the vcpus
>> should be stopped.
>>
>>>
>>> With fast page fault:
>>>
>>> if (cmpxchg64(sptep, spte, spte | PT_WRITABLE_MASK) == spte)
>>> /* The window in here... */
>>> mark_page_dirty(vcpu->kvm, gfn);
>>>
>>> And the $SUBJECT set_spte reordering, the rule becomes
>>>
>>> A call to GET_DIRTY_LOG guarantees to return correct information about
>>> dirty pages before invocation of the previous GET_DIRTY_LOG call.
>>>
>>> (see example 1: the next GET_DIRTY_LOG will return the dirty information
>>> there).
>>>
>>
>> It seems no.
>>
>> The first GET_DIRTY_LOG can happen before fast-page-faultï
>> the second GET_DIRTY_LOG happens in the window between cmpxchg()
>> and mark_page_dirty(), for the second one, the information is still âincorrectâ.
>
> Its correct for the previous GET_DIRTY_LOG call.

Oh, yes.

>
>>> The rule for sptes that is, because kvm_write_guest does not match the
>>> documentation at all.
>>
>> You mean the case of âkvm_write_guestâ is valid (I do not know why it is)?
>> Or anything else?
>>
>>>
>>> So before example 1 and this patch, the rule (well for sptes at least) was
>>>
>>> "Given a memory slot, return a bitmap containing any pages dirtied
>>> since the last call to this ioctl. Bit 0 is the first page in the
>>> memory slot. Ensure the entire structure is cleared to avoid padding
>>> issues."
>>>
>>> Can you explain why it is OK to relax this rule?
>>
>> Itâs because:
>> 1) it doesnât break current use cases, i.e. Live migration and FB-flushing.
>> 2) the current code, like kvm_write_guest has already broken the documentation
>> (the guest page has been written but missed in the dirty bitmap).
>> 3) itâs needless to implement a exact get-dirty-pages since the dirty pages can
>> no be exactly got except stopping vcpus.
>>
>> So i think we'd document this case instead. No?
>
> Lets figure out the requirements, then. I don't understand why
> FB-flushing is safe (think kvm-autotest: one pixel off the entire
> test fails).

I did not read FB-flushing code, i guess the reason why it can work is:
FB-flushing do periodicity get-dirty-page and flush it. After guest writes
the page, the page will be flushed in the next GET_DIRTY_LOG.

>
> Before fast page fault: Pages are either write protected or the
> corresponding dirty bitmap bit is set. Any write faults to dirty logged
> sptes while GET_DIRTY log is executing in the protected section are
> allowed to instantiate writeable spte after GET_DIRTY log is finished.
>
> After fast page fault: Pages can be writeable and the dirty bitmap not
> set. Therefore data can be dirty before GET_DIRTY executes and still
> fail to be returned in the bitmap.

Itâs right. The current GET_DIRTY fail to get the dirty page but the
next GET_DIRTY can get it properly since the current GET_DIRTY
need to flush all TLBs that waits for fast-page-fault finish.

I do not think itâs a big problem since single GET_DIRTY is useless as
i explained in the previous mail.

>
> Since this patchset does not introduce change in behaviour (fast pf
> did), no reason to avoid merging this.

Yes, thank you, Marcelo! :)

>
> BTW, since GET_DIRTY log does not require to report concurrent (to
> GET_DIRTY_LOG) write faults, it is not necessary to rewalk the spte
> list, is it?

You mean the ârewalkâ we introduced in pte_list_walk_lockless() in this patchset?
I think this rewalk is needed because itâs caused by meet a unexpected nulls that
means weâre walking on the unexpected rmap. If we do not do this, some writable
sptes will be missed.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/