On Wed, Aug 16, 2023 at 09:43:40AM +0200, David Hildenbrand wrote:
On 15.08.23 04:34, John Hubbard wrote:Currently MADV_* is up to 25
On 8/14/23 02:09, Yan Zhao wrote:
...
hmm_range_fault()-based memory management in particular might benefitThanks a lot for those good suggestions!
from having NUMA balancing disabled entirely for the memremap_pages()
region, come to think of it. That seems relatively easy and clean at
first glance anyway.
For other regions (allocated by the device driver), a per-VMA flag
seems about right: VM_NO_NUMA_BALANCING ?
For VMs, when could a per-VMA flag be set?
Might be hard in mmap() in QEMU because a VMA may not be used for DMA until
after it's mapped into VFIO.
Then, should VFIO set this flag on after it maps a range?
Could this flag be unset after device hot-unplug?
I'm hoping someone who thinks about VMs and VFIO often can chime in.
At least QEMU could just set it on the applicable VMAs (as said by Yuan Yao,
using madvise).
BUT, I do wonder what value there would be for autonuma to still be active
#define MADV_COLLAPSE 25,
while madvise behavior is of type "int". So it's ok.
But vma->vm_flags is of "unsigned long", so it's full at least on 32bit platform.
for the remainder of the hypervisor. If there is none, a prctl() would beAdd a new field in "struct vma_numab_state" in vma, and use prctl() to
better.
update this field?
We already do have a mechanism in QEMU to get notified when longterm-pinningLooks this ram_block_discard allow/disallow state is global rather than per-VMA
in the kernel might happen (and, therefore, MADV_DONTNEED must not be used):
* ram_block_discard_disable()
* ram_block_uncoordinated_discard_disable()
in QEMU.
So, do you mean that let kernel provide a per-VMA allow/disallow mechanism, and
it's up to the user space to choose between per-VMA and complex way or
global and simpler way?