Re: [PATCH v4 01/19] KVM: x86: Allocate new rmap and large page tracking when moving memslot

From: Sean Christopherson
Date: Tue Dec 17 2019 - 17:21:02 EST


On Tue, Dec 17, 2019 at 04:56:40PM -0500, Peter Xu wrote:
> On Tue, Dec 17, 2019 at 12:40:23PM -0800, Sean Christopherson wrote:
> > Reallocate a rmap array and recalcuate large page compatibility when
> > moving an existing memslot to correctly handle the alignment properties
> > of the new memslot. The number of rmap entries required at each level
> > is dependent on the alignment of the memslot's base gfn with respect to
> > that level, e.g. moving a large-page aligned memslot so that it becomes
> > unaligned will increase the number of rmap entries needed at the now
> > unaligned level.

...

> I think the error-prone part is:
>
> new = old = *slot;

Lol, IMO the error-prone part is the entire memslot mess :-)

> Where IMHO it would be better if we only copy pointers explicitly when
> under control, rather than blindly copying all the pointers in the
> structure which even contains sub-structures.

Long term, yes, that would be ideal. For the immediate bug fix, reworking
common KVM and other arch code would be unnecessarily dangerous and would
make it more difficult to backport the fix to stable branches.

I actually briefly considered moving the slot->arch handling into arch
code as part of the bug fix, but the memslot code has many subtle
dependencies, e.g. PPC and x86 rely on common KVM code to copy slot->arch
when flags are being changed.

I'll happily clean up the slot->arch code once this series is merged.
There is refactoring in this series that will make it a lot easier to do
additional clean up.

> For example, I see PPC has this:
>
> struct kvm_arch_memory_slot {
> #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> unsigned long *rmap;
> #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
> };
>
> I started to look into HV code of it a bit, then I see...
>
> - kvm_arch_create_memslot(kvmppc_core_create_memslot_hv) init slot->arch.rmap,
> - kvm_arch_flush_shadow_memslot(kvmppc_core_flush_memslot_hv) didn't free it,
> - kvm_arch_prepare_memory_region(kvmppc_core_prepare_memory_region_hv) is nop.
>
> So Does it have similar issue?

No, KVM doesn't allow a memslot's size to be changed, and PPC's rmap
allocation is directly tied to the size of the memslot. The x86 bug exists
because the size of its metadata arrays varies based on the alignment of
the base gfn.