Re: [PATCH 5/5] KVM: MMU: flush tlb out of mmu lock when write-protect the sptes
From: Xiao Guangrong
Date: Tue Apr 15 2014 - 07:11:00 EST
Hi Marcelo,
Thanks your time to review it.
On 04/09/2014 10:51 PM, Marcelo Tosatti wrote:
>> +/*
>> + * Please note PT_WRITABLE_MASK is not stable since
>> + * 1) fast_page_fault() sets spte from readonly to writable out of mmu-lock or
>> + * 2) kvm_mmu_slot_remove_write_access() and kvm_vm_ioctl_get_dirty_log()
>> + * can write protect sptes but flush tlb out mmu-lock that means we may use
>> + * the corrupt tlb entries which depend on this bit.
>> + *
>> + * Both cases do not modify the status of spte_is_locklessly_modifiable() so
>> + * if you want to check whether the spte is writable on MMU you can check
>> + * SPTE_MMU_WRITEABLE instead. If you want to update spte without losing
>> + * A/D bits and tlb flush, you can check spte_is_locklessly_modifiable()
>> + * instead. See the comments in spte_has_volatile_bits() and
>> + * mmu_spte_update().
>> + */
>> static inline int is_writable_pte(unsigned long pte)
>> {
>
> Xiao,
>
> Can't get the SPTE_MMU_WRITEABLE part.
>
> So assume you are writing code to perform some action after guest memory
> has been write protected. You would
>
> spin_lock(mmu_lock);
>
> if (writeable spte bit is set)
> remove writeable spte bit from spte
> remote TLB flush (*)
> action
>
> spin_unlock(mmu_lock);
>
> (*) is necessary because reading the writeable spte bit as zero
> does not guarantee remote TLBs have been flushed.
>
> Now what SPTE_MMU_WRITEABLE has to do with it ?
It is contained in "remove writeable spte bit from spte" which
is done by mmu_spte_update():
if (spte_is_locklessly_modifiable(old_spte) &&
!is_writable_pte(new_spte))
ret = true;
>
> Perhaps a recipe like that (or just the rules) would be useful.
Okay, i am considering to improve this comments, how about like
this:
Currently, we have two sorts of write-protection, a) the first
one write-protects guest page to sync the guest modification,
b) another one is used to sync dirty bitmap when we do
KVM_GET_DIRTY_LOG. The differences between these two sorts are:
1) the first case clears SPTE_MMU_WRITEABLE bit.
2) the first case requires flushing tlb immediately avoiding
corrupting shadow page table between all vcpus so it should
be in the protection of mmu-lock. And the another case does
not need to flush tlb until returning the dirty bitmap to
userspace since it only write-protects the page logged in
the bitmap, that means the page in the dirty bitmap is not
missed, so it can flush tlb out of mmu-lock.
So, there is the problem: the first case can meet the corrupted
tlb caused by another case which write-protects pages but without
flush tlb immediately. In order to making the first case be aware
this problem we let it flush tlb if we try to write-protect
a spte whose SPTE_MMU_WRITEABLE bit is set, it works since another
case never touches SPTE_MMU_WRITEABLE bit.
Anyway, whenever a spte is updated (only permission and status bits
are changed) we need to check whether the spte with SPTE_MMU_WRITEABLE
becomes readonly, if that happens, we need to flush tlb. Fortunately,
mmu_spte_update() has already handled it perfectly.
The rules to use SPTE_MMU_WRITEABLE and PT_WRITABLE_MASK:
- if we want to see if it has writable tlb entry or if the spte can
be writable on the mmu mapping, check SPTE_MMU_WRITEABLE, this is
the most case, otherwise
- if we fix page fault on the spte or do write-protection by dirty logging,
check PT_WRITABLE_MASK.
TODO: introduce APIs to split these two cases.
>
> The remaining patches look good.
Thank you.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/