Re: [External] Re: [PATCH v13 05/12] mm: hugetlb: allocate the vmemmap pages associated with each HugeTLB page

From: Oscar Salvador
Date: Tue Jan 26 2021 - 12:38:47 EST


On Mon, Jan 25, 2021 at 03:25:35PM -0800, Mike Kravetz wrote:
> IIUC, even non-gigantic hugetlb pages can exist in CMA. They can be migrated
> out of CMA if needed (except free pages in the pool, but that is a separate
> issue David H already noted in another thread).

Yeah, as discussed I am taking a look at that.

> When we first started discussing this patch set, one suggestion was to force
> hugetlb pool pages to be allocated at boot time and never permit them to be
> freed back to the buddy allocator. A primary reason for the suggestion was
> to avoid this issue of needing to allocate memory when freeing a hugetlb page
> to buddy. IMO, that would be an unreasonable restriction for many existing
> hugetlb use cases.

AFAIK it was suggested as a way to simplify things in the first go of this
patchset.
Please note that the first versions of this patchset was dealing with PMD
mapped vmemmap pages and overall it was quite convulated for a first
version.
Since then, things had simplified quite a lot (e.g: we went from 22 patches to 12),
so I do not feel the need to force the pages to be allocated at boot time.

> A simple thought is that we simply fail the 'freeing hugetlb page to buddy'
> if we can not allocate the required vmemmap pages. However, as David R says
> freeing hugetlb pages to buddy is a reasonable way to free up memory in oom
> situations. However, failing the operation 'might' be better than looping
> forever trying to allocate the pages needed? As mentioned in the previous
> patch, it would be better to use GFP_ATOMIC to at least dip into reserves if
> we can.

I also agree that GFP_ATOMIC might make some sense.
If the system is under memory pressure, I think it is best if we go the extra
mile in order to free up to 4096 pages or 512 pages.
Otherwise we might have a nice hugetlb page we might not need and a lack of
memory.

> I think using pages of the hugetlb for vmemmap to cover pages of the hugetlb
> is the only way we can guarantee success of freeing a hugetlb page to buddy.
> However, this should only only be used when there is no other option and could
> result in vmemmap pages residing in CMA or ZONE_MOVABLE. I'm not sure how
> much better this is than failing the free to buddy operation.

And how would you tell when there is no other option?

> I don't have a solution. Just wanted to share some thoughts.
>
> BTW, just thought of something else. Consider offlining a memory section that
> contains a free hugetlb page. The offline code will try to disolve the hugetlb
> page (free to buddy). So, vmemmap pages will need to be allocated. We will
> try to allocate vmemap pages on the same node as the hugetlb page. But, if
> this memory section is the last of the node all the pages will have been
> isolated and no allocations will succeed. Is that a possible scenario, or am
> I just having too many negative thoughts?

IIUC, GFP_ATOMIC will reset ALLOC_CPUSET flags at some point and the nodemask will
be cleared, so I guess the system will try to allocate from another node.
But I am not sure about that one.

I would like to hear Michal's thoughts on this.


--
Oscar Salvador
SUSE L3