[Patch v4 00/18] NUMA aware page table allocation

From: Vipin Sharma
Date: Mon Mar 06 2023 - 17:41:43 EST


Hi,

This series build up based on the feedback on v3.

Biggest change in features is to enable NUMA aware page table per VM
basis instead of using a module parameter for all VMs on a host. This
was decided based on an internal discussion to avoid forcing all VMs to
be NUMA aware on a host. We need to collect more data to see how much
performance degradation a VM can get in negative testing, where vCPUs in
VM are always accessing remote NUMA nodes memory instead of staying
local compared to a VM which is not NUMA aware.

There are other changes which are mentioned in the change log below for
v4.

Thanks
Vipin

v4:
- Removed module parameter for enabling NUMA aware page table.
- Added new capability KVM_CAP_NUMA_AWARE_PAGE_TABLE to enable this
feature per VM.
- Added documentation for the new capability.
- Holding mutex just before the top up and releasing it after the
fault/split is addressed. Previous version were using spinlocks two
times, first time for topup and second time fetching the page from
cache.
- Using the existing slots_lock for split_shadow_page_cache operations.
- KVM MMU shrinker will also shrink mm_shadow_info_cache besides
split_shadow_page_cache and mmu_shadow_page_cache.
- Reduced cache default size to 4.
- Split patches into smaller ones.

v3: https://lore.kernel.org/lkml/20221222023457.1764-1-vipinsh@xxxxxxxxxx/
- Split patches into smaller ones.
- Repurposed KVM MMU shrinker to free cache pages instead of oldest page table
pages
- Reduced cache size from 40 to 5
- Removed __weak function and initializing node value in all architectures.
- Some name changes.

v2: https://lore.kernel.org/lkml/20221201195718.1409782-1-vipinsh@xxxxxxxxxx/
- All page table pages will be allocated on underlying physical page's
NUMA node.
- Introduced module parameter, numa_aware_pagetable, to disable this
feature.
- Using kvm_pfn_to_refcounted_page to get page from a pfn.

v1: https://lore.kernel.org/all/20220801151928.270380-1-vipinsh@xxxxxxxxxx/

Vipin Sharma (18):
KVM: x86/mmu: Change KVM mmu shrinker to no-op
KVM: x86/mmu: Remove zapped_obsolete_pages from struct kvm_arch{}
KVM: x86/mmu: Track count of pages in KVM MMU page caches globally
KVM: x86/mmu: Shrink shadow page caches via MMU shrinker
KVM: x86/mmu: Add split_shadow_page_cache pages to global count of MMU
cache pages
KVM: x86/mmu: Shrink split_shadow_page_cache via MMU shrinker
KVM: x86/mmu: Unconditionally count allocations from MMU page caches
KVM: x86/mmu: Track unused mmu_shadowed_info_cache pages count via
global counter
KVM: x86/mmu: Shrink mmu_shadowed_info_cache via MMU shrinker
KVM: x86/mmu: Add per VM NUMA aware page table capability
KVM: x86/mmu: Add documentation of NUMA aware page table capability
KVM: x86/mmu: Allocate NUMA aware page tables on TDP huge page splits
KVM: mmu: Add common initialization logic for struct
kvm_mmu_memory_cache{}
KVM: mmu: Initialize kvm_mmu_memory_cache.gfp_zero to __GFP_ZERO by
default
KVM: mmu: Add NUMA node support in struct kvm_mmu_memory_cache{}
KVM: x86/mmu: Allocate numa aware page tables during page fault
KVM: x86/mmu: Allocate shadow mmu page table on huge page split on the
same NUMA node
KVM: x86/mmu: Reduce default mmu memory cache size

Documentation/virt/kvm/api.rst | 29 +++
arch/arm64/kvm/arm.c | 2 +-
arch/arm64/kvm/mmu.c | 2 +-
arch/mips/kvm/mips.c | 3 +
arch/riscv/kvm/mmu.c | 8 +-
arch/riscv/kvm/vcpu.c | 2 +-
arch/x86/include/asm/kvm_host.h | 17 +-
arch/x86/include/asm/kvm_types.h | 6 +-
arch/x86/kvm/mmu/mmu.c | 319 +++++++++++++++++++------------
arch/x86/kvm/mmu/mmu_internal.h | 38 ++++
arch/x86/kvm/mmu/paging_tmpl.h | 29 +--
arch/x86/kvm/mmu/tdp_mmu.c | 23 ++-
arch/x86/kvm/x86.c | 18 +-
include/linux/kvm_host.h | 2 +
include/linux/kvm_types.h | 21 ++
include/uapi/linux/kvm.h | 1 +
virt/kvm/kvm_main.c | 24 ++-
17 files changed, 386 insertions(+), 158 deletions(-)

--
2.40.0.rc0.216.gc4246ad0f0-goog