Re: [PATCH v2 3/8] kvm: explicitly set FOLL_HONOR_NUMA_FAULT in hva_to_pfn_slow()

From: David Hildenbrand
Date: Wed Aug 02 2023 - 11:32:20 EST


On 02.08.23 17:27, Mel Gorman wrote:
On Tue, Aug 01, 2023 at 02:48:39PM +0200, David Hildenbrand wrote:
KVM is *the* case we know that really wants to honor NUMA hinting falls.
As we want to stop setting FOLL_HONOR_NUMA_FAULT implicitly, set
FOLL_HONOR_NUMA_FAULT whenever we might obtain pages on behalf of a VCPU
to map them into a secondary MMU, and add a comment why.

Do that unconditionally in hva_to_pfn_slow() when calling
get_user_pages_unlocked().

kvmppc_book3s_instantiate_page(), hva_to_pfn_fast() and
gfn_to_page_many_atomic() are similarly used to map pages into a
secondary MMU. However, FOLL_WRITE and get_user_page_fast_only() always
implicitly honor NUMA hinting faults -- as documented for
FOLL_HONOR_NUMA_FAULT -- so we can limit this change to a single location
for now.

Don't set it in check_user_page_hwpoison(), where we really only want to
check if the mapped page is HW-poisoned.

We won't set it for other KVM users of get_user_pages()/pin_user_pages()
* arch/powerpc/kvm/book3s_64_mmu_hv.c: not used to map pages into a
secondary MMU.
* arch/powerpc/kvm/e500_mmu.c: only used on shared TLB pages with userspace
* arch/s390/kvm/*: s390x only supports a single NUMA node either way
* arch/x86/kvm/svm/sev.c: not used to map pages into a secondary MMU.

This is a preparation for making FOLL_HONOR_NUMA_FAULT no longer
implicitly be set by get_user_pages() and friends.

Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>

Seems sane but I don't know KVM well enough to know if this is the only
relevant case so didn't ack.

Makes sense, some careful eyes from KVM people would be appreciated.

At least from kvm_main.c POV, I'm pretty confident that that's it.

--
Cheers,

David / dhildenb