Re: [PATCHv4 1/8] mm: Add support for unaccepted memory
From: Kirill A. Shutemov
Date: Wed Apr 13 2022 - 07:28:56 EST
On Wed, Apr 13, 2022 at 12:36:11PM +0200, David Hildenbrand wrote:
> On 12.04.22 18:08, Dave Hansen wrote:
> > On 4/12/22 01:15, David Hildenbrand wrote:
> >> Can we simply automate this using a kthread or smth like that, which
> >> just traverses the free page lists and accepts pages (similar, but
> >> different to free page reporting)?
> >
> > That's definitely doable.
> >
> > The downside is that this will force premature consumption of physical
> > memory resources that the guest may never use. That's a particular
> > problem on TDX systems since there is no way for a VMM to reclaim guest
> > memory short of killing the guest.
>
> IIRC, the hypervisor will usually effectively populate all guest RAM
> either way right now.
No, it is not usual. By default QEMU/KVM uses anonymous mapping and
fault-in memory on demand.
Yes, there's an option to pre-populate guest memory, but it is not the
default.
> So yes, for hypervisors that might optimize for
> that, that statement would be true. But I lost track how helpful it
> would be in the near future e.g., with the fd-based private guest memory
> -- maybe they already optimize for delayed acceptance of memory, turning
> it into delayed population.
>
> >
> > In other words, I can see a good argument either way:
> > 1. The kernel should accept everything to avoid the perf nastiness
> > 2. The kernel should accept only what it needs in order to reduce memory
> > use
> >
> > I'm kinda partial to #1 though, if I had to pick only one.
> >
> > The other option might be to tie this all to DEFERRED_STRUCT_PAGE_INIT.
> > Have the rule that everything that gets a 'struct page' must be
> > accepted. If you want to do delayed acceptance, you do it via
> > DEFERRED_STRUCT_PAGE_INIT.
>
> That could also be an option, yes. At least being able to chose would be
> good. But IIRC, DEFERRED_STRUCT_PAGE_INIT will still make the system get
> stuck during boot and wait until everything was accepted.
Right. It deferred page init has to be done before init.
> I see the following variants:
>
> 1) Slow boot; after boot, all memory is already accepted.
> 2) Fast boot; after boot, all memory will slowly but steadily get
> accepted in the background. After a while, all memory is accepted and
> can be signaled to user space.
> 3) Fast boot; after boot, memory gets accepted on demand. This is what
> we have in this series.
>
> I somehow don't quite like 3), but with deferred population in the
> hypervisor, it might just make sense.
Conceptionally, 3 is not different from what happens now. The first time
normal VM touches the page (like on handling __GFP_ZERO) the page gets
allocated on host. It can take very long time if it kicks in direct
reclaim on the host.
The only difference is that it is *usually* slower.
I guest we can make a case for making 1 an option to match pre-populated
use case for normal VMs.
Frankly, I think option 2 is the worst one. You still CPU cycles from the
workload after boot to do the job that may or may not be needed. It is an
half-measure that helps nobody.
--
Kirill A. Shutemov