Re: [PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes

From: David Hildenbrand
Date: Fri Mar 14 2025 - 05:33:25 EST

Next message: Ingo Molnar: "Re: [PATCH 4/5] x86/syscall/x32: Move x32 syscall table"
Previous message: Alexander Sverdlin: "Re: [PATCH v13 2/3] soc: sophgo: cv1800: rtcsys: New driver (handling RTC only)"
In reply to: Yan Zhao: "Re: [PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes"
Next in thread: Yan Zhao: "Re: [PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 14.03.25 10:09, Yan Zhao wrote:

On Wed, Jan 22, 2025 at 03:25:29PM +0100, David Hildenbrand wrote:

(split is possible if there are no unexpected folio references; private
pages cannot be GUP'ed, so it is feasible)

...

Note that I'm not quite sure about the "2MB" interface, should it be
a
"PMD-size" interface?

I think Mike and I touched upon this aspect too - and I may be
misremembering - Mike suggested getting 1M, 2M, and bigger page sizes
in increments -- and then fitting in PMD sizes when we've had enough of
those. That is to say he didn't want to preclude it, or gate the PMD
work on enabling all sizes first.

Starting with 2M is reasonable for now. The real question is how we want to
deal with

Hi David,

Hi!

I'm just trying to understand the background of in-place conversion.

Regarding to the two issues you mentioned with THP and non-in-place-conversion,
I have some questions (still based on starting with 2M):

(a) Not being able to allocate a 2M folio reliably

If we start with fault in private pages from guest_memfd (not in page pool way)
and shared pages anonymously, is it correct to say that this is only a concern
when memory is under pressure?

Usually, fragmentation starts being a problem under memory pressure, and memory pressure can show up simply because the page cache makes us of as much memory as it wants.

As soon as we start allocating a 2 MB page for guest_memfd, to then split it up + free only some parts back to the buddy (on private->shared conversion), we create fragmentation that cannot get resolved as long as the remaining private pages are not freed. A new conversion from shared->private on the previously freed parts will allocate other unmovable pages (not the freed ones) and make fragmentation worse.

In-place conversion improves that quite a lot, because guest_memfd tself will not cause unmovable fragmentation. Of course, under memory pressure, when and cannot allocate a 2M page for guest_memfd, it's unavoidable. But then, we already had fragmentation (and did not really cause any new one).

We discussed in the upstream call, that if guest_memfd (primarily) only allocates 2M pages and frees 2M pages, it will not cause fragmentation itself, which is pretty nice.

(b) Partial discarding

For shared pages, page migration and folio split are possible for shared THP?

I assume by "shared" you mean "not guest_memfd, but some other memory we use as an overlay" -- so no in-place conversion.

Yes, that should be possible as long as nothing else prevents migration/split (e.g., longterm pinning)

For private pages, as you pointed out earlier, if we can ensure there are no
unexpected folio references for private memory, splitting a private huge folio
should succeed.

Yes, and maybe (hopefully) we'll reach a point where private parts will not have a refcount at all (initially, frozen refcount, discussed during the last upstream call).

Are you concerned about the memory fragmentation after repeated

partial conversions of private pages to and from shared?

Not only repeated, even just a single partial conversion. But of course, repeated partial conversions will make it worse (e.g., never getting a private huge page back when there was a partial conversion).

--
Cheers,

David / dhildenb

Next message: Ingo Molnar: "Re: [PATCH 4/5] x86/syscall/x32: Move x32 syscall table"
Previous message: Alexander Sverdlin: "Re: [PATCH v13 2/3] soc: sophgo: cv1800: rtcsys: New driver (handling RTC only)"
In reply to: Yan Zhao: "Re: [PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes"
Next in thread: Yan Zhao: "Re: [PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]