Re: [PATCH v5 00/21] Free some vmemmap pages of hugetlb page

From: Michal Hocko
Date: Mon Nov 23 2020 - 02:38:56 EST


On Fri 20-11-20 09:45:12, Mike Kravetz wrote:
> On 11/20/20 1:43 AM, David Hildenbrand wrote:
[...]
> >>> To keep things easy, maybe simply never allow to free these hugetlb pages
> >>> again for now? If they were reserved during boot and the vmemmap condensed,
> >>> then just let them stick around for all eternity.
> >>
> >> Not sure I understand. Do you propose to only free those vmemmap pages
> >> when the pool is initialized during boot time and never allow to free
> >> them up? That would certainly make it safer and maybe even simpler wrt
> >> implementation.
> >
> > Exactly, let's keep it simple for now. I guess most use cases of this (virtualization, databases, ...) will allocate hugepages during boot and never free them.
>
> Not sure if I agree with that last statement. Database and virtualization
> use cases from my employer allocate allocate hugetlb pages after boot. It
> is shortly after boot, but still not from boot/kernel command line.

Is there any strong reason for that?

> Somewhat related, but not exactly addressing this issue ...
>
> One idea discussed in a previous patch set was to disable PMD/huge page
> mapping of vmemmap if this feature was enabled. This would eliminate a bunch
> of the complex code doing page table manipulation. It does not address
> the issue of struct page pages going away which is being discussed here,
> but it could be a way to simply the first version of this code. If this
> is going to be an 'opt in' feature as previously suggested, then eliminating
> the PMD/huge page vmemmap mapping may be acceptable. My guess is that
> sysadmins would only 'opt in' if they expect most of system memory to be used
> by hugetlb pages. We certainly have database and virtualization use cases
> where this is true.

Would this simplify the code considerably? I mean, the vmemmap page
tables will need to be updated anyway. So that code has to stay. PMD
entry split shouldn't be the most complex part of that operation. On
the other hand dropping large pages for all vmemmaps will likely have a
performance.
--
Michal Hocko
SUSE Labs