Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page

From: David Hildenbrand
Date: Fri Feb 12 2021 - 09:17:07 EST


On 11.02.21 19:05, Mike Kravetz wrote:
On 2/8/21 12:50 AM, Muchun Song wrote:
When we free a HugeTLB page to the buddy allocator, we should allocate the
vmemmap pages associated with it. But we may cannot allocate vmemmap pages
when the system is under memory pressure, in this case, we just refuse to
free the HugeTLB page instead of looping forever trying to allocate the
pages.

Signed-off-by: Muchun Song <songmuchun@xxxxxxxxxxxxx>
---
include/linux/mm.h | 2 ++
mm/hugetlb.c | 19 ++++++++++++-
mm/hugetlb_vmemmap.c | 30 +++++++++++++++++++++
mm/hugetlb_vmemmap.h | 6 +++++
mm/sparse-vmemmap.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++-
5 files changed, 130 insertions(+), 2 deletions(-)

Muchun has done a great job simplifying this patch series and addressing
issues as they are brought up. This patch addresses the issue which seems
to be the biggest stumbling block to this series. The need to allocate
vmemmap pages to dissolve a hugetlb page to the buddy allocator. The way
it is addressed in this patch is to simply fail to dissolve the hugetlb
page if the vmmemmap pages can not be allocated. IMO, this is an 'acceptable'
strategy. If we find ourselves in this situation then we are likely to be
hitting other corner cases in the system. I wish there was a perfect way
to address this issue, but we have been unable to come up with one.

There was a decent discussion about this is a previous version of the
series starting here:
https://lore.kernel.org/linux-mm/20210126092942.GA10602@linux/
In this thread various other options were suggested and discussed.

I would like to come to some agreement on an acceptable way to handle this
specific issue. IMO, it makes little sense to continue refining other
parts of this series if we can not figure out how to move forward on this
issue.

It would be great if David H, David R and Michal could share their opinions
on this. No need to review details the code yet (unless you want), but
let's start a discussion on how to move past this issue if we can.

So a summary from my side:

We might fail freeing a huge page at any point in time iff we are low on kernel (!CMA, !ZONE_MOVABLE) memory. While we could play games with allocating the vmemmap from a huge page itself in some cases (e.g., !CMA, !ZONE_MOVABLE), simply retrying is way easier and we don't turn the huge page forever unusable.

Corner cases might be having many huge pages in ZONE_MOVABLE, freeing them all at once and eating up a lot of kernel memory. But then, the same setup would already be problematic nowadays where we simply always consume that kernel memory for the vmemmap.

I think this problem only really becomes visible in corner cases. And someone actively has to enable new behavior.


1. Failing to free a huge page triggered by the user (decrease nr_pages):

Bad luck. Try again later.

2. Failing to free a surplus huge page when freed by the application:

Bad luck. But who will try again later?

3. Failing to dissolve a free huge page on ZONE_MOVABLE via offline_pages()

This is a bit unfortunate if we have plenty of ZONE_MOVABLE memory but are low on kernel memory. For example, migration of huge pages would still work, however, dissolving the free page does not work. I'd say this is a corner cases. When the system is that much under memory pressure, offlining/unplug can be expected to fail.

4. Failing to dissolve a huge page on CMA/ZONE_MOVABLE via alloc_contig_range() - once we have that handling in place. Mainly affects CMA and virtio-mem.

Similar to 3. However, we didn't even take care of huge pages *at all* for now (neither migrate nor dissolve). So actually don't make the current state any worse. virito-mem will handle migration errors gracefully. CMA might be able to fallback on other free areas within the CMA region.


I'd say, document the changed behavior properly so people are aware that there might be issues in corner cases with huge pages on CMA / ZONE_MOVABLE.

--
Thanks,

David / dhildenb