Re: [PATCH RFC 04/15] KVM: Implement ring-based dirty memory tracking
From: Paolo Bonzini
Date: Tue Dec 10 2019 - 05:07:39 EST
On 09/12/19 22:54, Peter Xu wrote:
> Just until recently I noticed that actually kvm_get_running_vcpu() has
> a real benefit in that it gives a very solid result on whether we're
> with the vcpu context, even more accurate than when we pass vcpu
> pointers around (because sometimes we just passed the kvm pointer
> along the stack even if we're with a vcpu context, just like what we
> did with mark_page_dirty_in_slot).
Right, that's the point.
> I'm thinking whether I can start
> to use this information in the next post on solving an issue I
> encountered with the waitqueue.
>
> Current waitqueue is still problematic in that it could wait even with
> the mmu lock held when with vcpu context.
I think the idea of the soft limit is that the waiting just cannot
happen. That is, the number of dirtied pages _outside_ the guest (guest
accesses are taken care of by PML, and are subtracted from the soft
limit) cannot exceed hard_limit - (soft_limit + pml_size).
> The issue is KVM_RESET_DIRTY_RINGS needs the mmu lock to manipulate
> the write bits, while it's the only interface to also wake up the
> dirty ring sleepers. They could dead lock like this:
>
> main thread vcpu thread
> =========== ===========
> kvm page fault
> mark_page_dirty_in_slot
> mmu lock taken
> mark dirty, ring full
> queue on waitqueue
> (with mmu lock)
> KVM_RESET_DIRTY_RINGS
> take mmu lock <------------ deadlock here
> reset ring gfns
> wakeup dirty ring sleepers
>
> And if we see if the mark_page_dirty_in_slot() is not with a vcpu
> context (e.g. kvm_mmu_page_fault) but with an ioctl context (those
> cases we'll use per-vm dirty ring) then it's probably fine.
>
> My planned solution:
>
> - When kvm_get_running_vcpu() != NULL, we postpone the waitqueue waits
> until we finished handling this page fault, probably in somewhere
> around vcpu_enter_guest, so that we can do wait_event() after the
> mmu lock released
I think this can cause a race:
vCPU 1 vCPU 2 host
---------------------------------------------------------------
mark page dirty
write to page
treat page as not dirty
add page to ring
where vCPU 2 skips the clean-page slow path entirely.
Paolo