Re: [PATCH v3] mm/hugetlb: split hugetlb_cma in nodes with memory

From: Aneesh Kumar K.V
Date: Mon Jul 27 2020 - 10:38:53 EST

Mike Kravetz <mike.kravetz@xxxxxxxxxx> writes:

> On 7/19/20 11:22 PM, Anshuman Khandual wrote:
>> On 07/17/2020 10:32 PM, Mike Kravetz wrote:
>>> On 7/16/20 10:02 PM, Anshuman Khandual wrote:
>>>> On 07/16/2020 11:55 PM, Mike Kravetz wrote:
>>>>> >From 17c8f37afbf42fe7412e6eebb3619c6e0b7e1c3c Mon Sep 17 00:00:00 2001
>>>>> From: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
>>>>> Date: Tue, 14 Jul 2020 15:54:46 -0700
>>>>> Subject: [PATCH] hugetlb: move cma reservation to code setting up gigantic
>>>>> hstate
>>>>> Instead of calling hugetlb_cma_reserve() directly from arch specific
>>>>> code, call from hugetlb_add_hstate when adding a gigantic hstate.
>>>>> hugetlb_add_hstate is either called from arch specific huge page setup,
>>>>> or as the result of hugetlb command line processing. In either case,
>>>>> this is late enough in the init process that all numa memory information
>>>>> should be initialized. And, it is early enough to still use early
>>>>> memory allocator.
>>>> This assumes that hugetlb_add_hstate() is called from the arch code at
>>>> the right point in time for the generic HugeTLB to do the required CMA
>>>> reservation which is not ideal. I guess it must have been a reason why
>>>> CMA reservation should always called by the platform code which knows
>>>> the boot sequence timing better.
>>> Actually, the code does not make the assumption that hugetlb_add_hstate
>>> is called from arch specific huge page setup. It can even be called later
>>> at the time of hugetlb command line processing.
>> Yes, now that hugetlb_cma_reserve() has been moved into hugetlb_add_hstate().
>> But then there is an explicit warning while trying to mix both the command
>> line options i.e hugepagesz= and hugetlb_cma=. The proposed code here have
>> not changed that behavior and hence the following warning should have been
>> triggered here as well.
>> 1) hugepagesz_setup()
>> hugetlb_add_hstate()
>> hugetlb_cma_reserve()
>> 2) hugepages_setup()
>> hugetlb_hstate_alloc_pages() when order >= MAX_ORDER
>> if (hstate_is_gigantic(h)) {
>> if (IS_ENABLED(CONFIG_CMA) && hugetlb_cma[0]) {
>> pr_warn_once("HugeTLB: hugetlb_cma is enabled, skip boot time allocation\n");
>> break;
>> }
>> if (!alloc_bootmem_huge_page(h))
>> break;
>> }
>> Nonetheless, it does not make sense to mix both memblock and CMA based huge
>> page pre-allocations. But looking at this again, could this warning be ever
>> triggered till now ? Unless, a given platform calls hugetlb_cma_reserve()
>> before _setup("hugepages=", hugepages_setup). Anyways, there seems to be
>> good reasons to keep both memblock and CMA based pre-allocations in place.
>> But mixing them together (as done in the proposed code here) does not seem
>> to be right.
> I'm not sure if I follow the question.
> This proposal does not change the trigger for the warning printed when one
> tries to both reserve CMA and pre-allocate gigantic pages. If hugetlb_cma
> is specified on the command line, and someone tries to pre-allocate gigantic
> pages they will get the warning. Such a command line on x86 might look like,
> hugetlb_cma=4G hugepagesz=1G hugepages=4
> You will then see,
> [ 0.065864] HugeTLB: hugetlb_cma is enabled, skip boot time allocation
> [ 0.065866] HugeTLB: allocating 4 of page size 1.00 GiB failed. Only allocated 0 hugepages.
> Ideally we could/should eliminate the second message.
> This behavior exists in the current code.

There is variant of this which is pseries powerpc where there is
hypervisor assistance w.r.t allocating gigantic pages. See the ppc64
enablement patch