Re: [PATCH v6 4/5] KVM: guest_memfd: Enforce NUMA mempolicy using shared policy

From: Shivank Garg
Date: Wed Mar 05 2025 - 01:03:08 EST




On 3/4/2025 10:29 PM, Sean Christopherson wrote:
> On Tue, Mar 04, 2025, David Hildenbrand wrote:
>> On 04.03.25 16:30, Sean Christopherson wrote:
>>> On Tue, Mar 04, 2025, Ackerley Tng wrote:
>>>> Vlastimil Babka <vbabka@xxxxxxx> writes:
>>>>>> struct shared_policy should be stored on the inode rather than the file,
>>>>>> since the memory policy is a property of the memory (struct inode),
>>>>>> rather than a property of how the memory is used for a given VM (struct
>>>>>> file).
>>>>>
>>>>> That makes sense. AFAICS shmem also uses inodes to store policy.
>>>>>
>>>>>> When the shared_policy is stored on the inode, intra-host migration [1]
>>>>>> will work correctly, since the while the inode will be transferred from
>>>>>> one VM (struct kvm) to another, the file (a VM's view/bindings of the
>>>>>> memory) will be recreated for the new VM.
>>>>>>
>>>>>> I'm thinking of having a patch like this [2] to introduce inodes.
>>>>>
>>>>> shmem has it easier by already having inodes
>>>>>
>>>>>> With this, we shouldn't need to pass file pointers instead of inode
>>>>>> pointers.
>>>>>
>>>>> Any downsides, besides more work needed? Or is it feasible to do it using
>>>>> files now and convert to inodes later?
>>>>>
>>>>> Feels like something that must have been discussed already, but I don't
>>>>> recall specifics.
>>>>
>>>> Here's where Sean described file vs inode: "The inode is effectively the
>>>> raw underlying physical storage, while the file is the VM's view of that
>>>> storage." [1].
>>>>
>>>> I guess you're right that for now there is little distinction between
>>>> file and inode and using file should be feasible, but I feel that this
>>>> dilutes the original intent.
>>>
>>> Hmm, and using the file would be actively problematic at some point. One could
>>> argue that NUMA policy is property of the VM accessing the memory, i.e. that two
>>> VMs mapping the same guest_memfd could want different policies. But in practice,
>>> that would allow for conflicting requirements, e.g. different policies in each
>>> VM for the same chunk of memory, and would likely lead to surprising behavior due
>>> to having to manually do mbind() for every VM/file view.
>>
>> I think that's the same behavior with shmem? I mean, if you have two people
>> asking for different things for the same MAP_SHARE file range, surprises are
>> unavoidable.
>
> Yeah, I was specifically thinking of the case where a secondary mapping doesn't
> do mbind() at all, e.g. could end up effectively polluting guest_memfd with "bad"
> allocations.

Thank you for the feedback.
I agree that storing the policy in the inode is the correct approach, as it aligns
with shmem's behavior. I now understand that keeping the policy in file-private data
could lead to surprising behavior, especially with multiple VMs mapping the same
guest_memfd.

The inode-based approach also makes sense from a long-term perspective, especially
with upcoming restricted mapping support. I'll pick the Ackerley's patch[1] to add
support for gmem inodes. With this patch, it does not seem overly complex to
implement to policy storage in inodes.

I'll test this approach and submit a revised patch shortly.

[1] https://lore.kernel.org/all/d1940d466fc69472c8b6dda95df2e0522b2d8744.1726009989.git.ackerleytng@xxxxxxxxxx/

Thanks,
Shivank