Re: [PATCH RFC v1 0/5] KVM: gmem: 2MB THP support and preparedness tracking changes
From: David Hildenbrand
Date: Fri Mar 14 2025 - 05:33:25 EST
On 14.03.25 10:09, Yan Zhao wrote:
On Wed, Jan 22, 2025 at 03:25:29PM +0100, David Hildenbrand wrote:
(split is possible if there are no unexpected folio references; private
pages cannot be GUP'ed, so it is feasible)
...
Note that I'm not quite sure about the "2MB" interface, should it be
a
"PMD-size" interface?
I think Mike and I touched upon this aspect too - and I may be
misremembering - Mike suggested getting 1M, 2M, and bigger page sizes
in increments -- and then fitting in PMD sizes when we've had enough of
those. That is to say he didn't want to preclude it, or gate the PMD
work on enabling all sizes first.
Starting with 2M is reasonable for now. The real question is how we want to
deal with
Hi David,
Hi!
I'm just trying to understand the background of in-place conversion.
Regarding to the two issues you mentioned with THP and non-in-place-conversion,
I have some questions (still based on starting with 2M):
(a) Not being able to allocate a 2M folio reliably
If we start with fault in private pages from guest_memfd (not in page pool way)
and shared pages anonymously, is it correct to say that this is only a concern
when memory is under pressure?
Usually, fragmentation starts being a problem under memory pressure, and
memory pressure can show up simply because the page cache makes us of as
much memory as it wants.
As soon as we start allocating a 2 MB page for guest_memfd, to then
split it up + free only some parts back to the buddy (on private->shared
conversion), we create fragmentation that cannot get resolved as long as
the remaining private pages are not freed. A new conversion from
shared->private on the previously freed parts will allocate other
unmovable pages (not the freed ones) and make fragmentation worse.
In-place conversion improves that quite a lot, because guest_memfd tself
will not cause unmovable fragmentation. Of course, under memory
pressure, when and cannot allocate a 2M page for guest_memfd, it's
unavoidable. But then, we already had fragmentation (and did not really
cause any new one).
We discussed in the upstream call, that if guest_memfd (primarily) only
allocates 2M pages and frees 2M pages, it will not cause fragmentation
itself, which is pretty nice.
(b) Partial discarding
For shared pages, page migration and folio split are possible for shared THP?
I assume by "shared" you mean "not guest_memfd, but some other memory we
use as an overlay" -- so no in-place conversion.
Yes, that should be possible as long as nothing else prevents
migration/split (e.g., longterm pinning)
For private pages, as you pointed out earlier, if we can ensure there are no
unexpected folio references for private memory, splitting a private huge folio
should succeed.
Yes, and maybe (hopefully) we'll reach a point where private parts will
not have a refcount at all (initially, frozen refcount, discussed during
the last upstream call).
Are you concerned about the memory fragmentation after repeated
partial conversions of private pages to and from shared?
Not only repeated, even just a single partial conversion. But of course,
repeated partial conversions will make it worse (e.g., never getting a
private huge page back when there was a partial conversion).
--
Cheers,
David / dhildenb