Re: [PATCH 1/3] KVM: x86/mmu: Dynamically allocate shadow MMU's hashed page list
From: Vipin Sharma
Date: Mon Mar 17 2025 - 12:39:04 EST
On 2025-03-14 19:40:08, Sean Christopherson wrote:
> Dynamically allocate the (massive) array of hashed lists used to track
> shadow pages, as the array itself is 32KiB, i.e. is an order-3 allocation
> all on its own, and is *exactly* an order-3 allocation. Dynamically
> allocating the array will allow allocating "struct kvm" using regular
> kmalloc(), and will also allow deferring allocation of the array until
> it's actually needed, i.e. until the first shadow root is allocated.
>
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
> arch/x86/include/asm/kvm_host.h | 4 ++--
> arch/x86/kvm/mmu/mmu.c | 21 ++++++++++++++++++++-
> arch/x86/kvm/x86.c | 5 ++++-
> 3 files changed, 26 insertions(+), 4 deletions(-)
>
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -6673,13 +6685,19 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
> kvm_tdp_mmu_zap_invalidated_roots(kvm, true);
> }
>
> -void kvm_mmu_init_vm(struct kvm *kvm)
> +int kvm_mmu_init_vm(struct kvm *kvm)
> {
> + int r;
> +
> kvm->arch.shadow_mmio_value = shadow_mmio_value;
> INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
> INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages);
> spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
>
> + r = kvm_mmu_alloc_page_hash(kvm);
> + if (r)
> + return r;
> +
In the patch 3, shouldn't this be moved to else part of the below
'if (tdp_mmu_enabled)' line? Otherwise, this hash array will always get
allocated.
> if (tdp_mmu_enabled)
> kvm_mmu_init_tdp_mmu(kvm);
>
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -12704,7 +12704,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
> if (ret)
> goto out;
>
> - kvm_mmu_init_vm(kvm);
> + ret = kvm_mmu_init_vm(kvm);
> + if (ret)
> + goto out_cleanup_page_track;
>
> ret = kvm_x86_call(vm_init)(kvm);
> if (ret)
> @@ -12757,6 +12759,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>
> out_uninit_mmu:
> kvm_mmu_uninit_vm(kvm);
> +out_cleanup_page_track:
I think there is a memory leak in this series.
1. kvm_mmu_uninit_vm() is not freeing kvm->arch.mmu_page_hash. So, in
error case out_uninit_mmu will not recover memory allocated in
kvm_mmu_alloc_page_hash().
2. When VM terminates or is killed then the same thing will happen, no
one is reclaiming the memory.
> kvm_page_track_cleanup(kvm);
> out:
> return ret;