Re: [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes

From: Xiao Guangrong
Date: Tue Apr 15 2014 - 07:11:00 EST

Next message: Mark Brown: "Re: [PATCH] ASoC: Intel: Fix incorrect sizeof() in sst_hsw_stream_get_volume()"
Previous message: Alexandre Belloni: "[PATCH 2/2] ARM: at91/dt: add peripherals to the at91sam9rlek board"
In reply to: Marcelo Tosatti: "Re: [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Marcelo,

Thanks your time to review it.

On 04/09/2014 10:51 PM, Marcelo Tosatti wrote:

>> +/*
>> + * Please note PT_WRITABLE_MASK is not stable since
>> + * 1) fast_page_fault() sets spte from readonly to writable out of mmu-lock or
>> + * 2) kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
>> + * can write protect sptes but flush tlb out mmu-lock that means we may use
>> + * the corrupt tlb entries which depend on this bit.
>> + *
>> + * Both cases do not modify the status of spte_is_locklessly_modifiable() so
>> + * if you want to check whether the spte is writable on MMU you can check
>> + * SPTE_MMU_WRITEABLE instead. If you want to update spte without losing
>> + * A/D bits and tlb flush, you can check spte_is_locklessly_modifiable()
>> + * instead. See the comments in spte_has_volatile_bits() and
>> + * mmu_spte_update().
>> + */
>> static inline int is_writable_pte(unsigned long pte)
>> {
>
> Xiao,
>
> Can't get the SPTE_MMU_WRITEABLE part.
>
> So assume you are writing code to perform some action after guest memory
> has been write protected. You would
>
> spin_lock(mmu_lock);
>
> if (writeable spte bit is set)
> remove writeable spte bit from spte
> remote TLB flush (*)
> action
>
> spin_unlock(mmu_lock);
>
> (*) is necessary because reading the writeable spte bit as zero
> does not guarantee remote TLBs have been flushed.
>
> Now what SPTE_MMU_WRITEABLE has to do with it ?

It is contained in "remove writeable spte bit from spte" which
is done by mmu_spte_update():

if (spte_is_locklessly_modifiable(old_spte) &&
!is_writable_pte(new_spte))
ret = true;
>
> Perhaps a recipe like that (or just the rules) would be useful.

Okay, i am considering to improve this comments, how about like
this:

Currently, we have two sorts of write-protection, a) the first
one write-protects guest page to sync the guest modification,
b) another one is used to sync dirty bitmap when we do
KVM_GET_DIRTY_LOG. The differences between these two sorts are:
1) the first case clears SPTE_MMU_WRITEABLE bit.
2) the first case requires flushing tlb immediately avoiding
corrupting shadow page table between all vcpus so it should
be in the protection of mmu-lock. And the another case does
not need to flush tlb until returning the dirty bitmap to
userspace since it only write-protects the page logged in
the bitmap, that means the page in the dirty bitmap is not
missed, so it can flush tlb out of mmu-lock.

So, there is the problem: the first case can meet the corrupted
tlb caused by another case which write-protects pages but without
flush tlb immediately. In order to making the first case be aware
this problem we let it flush tlb if we try to write-protect
a spte whose SPTE_MMU_WRITEABLE bit is set, it works since another
case never touches SPTE_MMU_WRITEABLE bit.

Anyway, whenever a spte is updated (only permission and status bits
are changed) we need to check whether the spte with SPTE_MMU_WRITEABLE
becomes readonly, if that happens, we need to flush tlb. Fortunately,
mmu_spte_update() has already handled it perfectly.

The rules to use SPTE_MMU_WRITEABLE and PT_WRITABLE_MASK:
- if we want to see if it has writable tlb entry or if the spte can
be writable on the mmu mapping, check SPTE_MMU_WRITEABLE, this is
the most case, otherwise
- if we fix page fault on the spte or do write-protection by dirty logging,
check PT_WRITABLE_MASK.

TODO: introduce APIs to split these two cases.

>
> The remaining patches look good.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Mark Brown: "Re: [PATCH] ASoC: Intel: Fix incorrect sizeof() in sst_hsw_stream_get_volume()"
Previous message: Alexandre Belloni: "[PATCH 2/2] ARM: at91/dt: add peripherals to the at91sam9rlek board"
In reply to: Marcelo Tosatti: "Re: [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]