Re: [PATCH] KVM: guest_memfd: fix NUMA interleave index double-counting

Next message: Conor Dooley: "Re: [PATCH 1/2] dt-bindings: dmaengine: Add SpacemiT K1 PDMA request numbers"
Previous message: Niklas Schnelle: "Re: [PATCH v3 1/4] s390/pci: Hold fmb_lock when enabling or disabling PCI devices"
In reply to: Sean Christopherson: "Re: [PATCH] KVM: guest_memfd: fix NUMA interleave index double-counting"
Next in thread: Sean Christopherson: "Re: [PATCH] KVM: guest_memfd: fix NUMA interleave index double-counting"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Michael S. Tsirkin

Date: Tue Jun 09 2026 - 15:58:10 EST

On Tue, Jun 09, 2026 at 09:31:29AM -0700, Sean Christopherson wrote:
> On Wed, 03 Jun 2026 11:57:33 -0400, Michael S. Tsirkin wrote:
> > kvm_gmem_get_policy() sets *ilx to the full page offset
> > (vm_pgoff + vma offset). But get_vma_policy() adds the page
> > offset on top of *ilx, so the offset is counted twice. This
> > causes NUMA interleaving to skip nodes: for order-0 pages the
> > effective index jumps by 2 for each consecutive page.
> >
> > The get_policy vm_op should return only a per-file bias in *ilx
> > (like shmem_get_policy does with inode->i_ino), letting
> > get_vma_policy() add the page-offset component.
> >
> > [...]
>
> Applied to kvm-x86 gmem, with a heavily massaged changelog to explicitly spell
> out that ilx == interleave index, and to try and explain the role of the index
> (it wasn't at all obvious to me why using the inode number was "correct").
>
> Thanks!
>
> [1/1] KVM: guest_memfd: fix NUMA interleave index double-counting
> https://github.com/kvm-x86/linux/commit/48dbe4732198

Thanks!

Sean, what is your take on interleaving for guest_memfd?

To the best of my understanding:

Right now IIUC kvm calls __filemap_get_folio_mpol which in turn does not pass
the index to filemap_alloc_folio. That uses NO_INTERLEAVE_INDEX, so
MPOL_INTERLEAVE uses the task's global counter - effectively
unpredictable placement. This looks like an oversight (the index was
available but never threaded down), but it's been shipping since 6.19.

Should we fix it to use the file offset instead? Or GPA? And if so,
should that be the default or does userspace need a way to opt out of
NO_INTERLEAVE_INDEX?

Thanks,
MST