Re: [External] Re: [PATCH v6 09/16] mm/hugetlb: Defer freeing of HugeTLB pages

From: Michal Hocko
Date: Tue Nov 24 2020 - 08:14:53 EST


On Tue 24-11-20 20:45:30, Muchun Song wrote:
> On Tue, Nov 24, 2020 at 7:51 PM Michal Hocko <mhocko@xxxxxxxx> wrote:
> >
> > On Tue 24-11-20 17:52:52, Muchun Song wrote:
> > > In the subsequent patch, we will allocate the vmemmap pages when free
> > > HugeTLB pages. But update_and_free_page() is called from a non-task
> > > context(and hold hugetlb_lock), so we can defer the actual freeing in
> > > a workqueue to prevent use GFP_ATOMIC to allocate the vmemmap pages.
> >
> > This has been brought up earlier without any satisfying answer. Do we
> > really have bother with the freeing from the pool and reconstructing the
> > vmemmap page tables? Do existing usecases really require such a dynamic
> > behavior? In other words, wouldn't it be much simpler to allow to use
>
> If someone wants to free a HugeTLB page, there is no way to do that if we
> do not allow this behavior.

Right. The question is how much that matters for the _initial_ feature
submission. Is this restriction so important that it would render it
unsuable?

> When do we need this? On our server, we will
> allocate a lot of HugeTLB pages for SPDK or virtualization. Sometimes,
> we want to debug some issues and want to apt install some debug tools,
> but if the host has little memory and the install operation can be failed
> because of no memory. In this time, we can try to free some HugeTLB
> pages to buddy in order to continue debugging. So maybe we need this.

Or maybe you can still allocate hugetlb pages for debugging in runtime
and try to free those when you need to.

> > hugetlb pages with sparse vmemmaps only for the boot time reservations
> > and never allow them to be freed back to the allocator. This is pretty
> > restrictive, no question about that, but it would drop quite some code
>
> Yeah, if we do not allow freeing the HugeTLB page to buddy, it actually
> can drop some code. But I think that it only drop this one and next one
> patch. It seems not a lot. And if we drop this patch, we need to add some
> another code to do the boot time reservations and other code to disallow
> freeing HugeTLB pages.

you need a per hugetlb page flag to note the sparse vmemmap anyway so
the freeing path should be a trivial check for the flag. Early boot
reservation. Special casing for the early boot reservation shouldn't be
that hard either. But I haven't checked closely.

> So why not support freeing now.

Because it adds some non trivial challenges which would be better to
deal with with a stable and tested and feature limited implementation.
The most obvious one is the problem with vmemmap allocations when
freeing hugetlb page. Others like vmemmap manipulation is quite some
code but no surprises. Btw. that should be implemented in vmemmap proper
and ready for other potential users. But this is a minor detail.

--
Michal Hocko
SUSE Labs