Re: [PATCH v2 1/2] hugetlb: Fix hugepages_setup when deal with pernode

From: Mike Kravetz
Date: Fri Apr 01 2022 - 13:24:20 EST


On 4/1/22 03:43, David Hildenbrand wrote:
> On 01.04.22 12:12, Peng Liu wrote:
>> Hugepages can be specified to pernode since "hugetlbfs: extend
>> the definition of hugepages parameter to support node allocation",
>> but the following problem is observed.
>>
>> Confusing behavior is observed when both 1G and 2M hugepage is set
>> after "numa=off".
>> cmdline hugepage settings:
>> hugepagesz=1G hugepages=0:3,1:3
>> hugepagesz=2M hugepages=0:1024,1:1024
>> results:
>> HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
>> HugeTLB registered 2.00 MiB page size, pre-allocated 1024 pages
>>
>> Furthermore, confusing behavior can be also observed when invalid
>> node behind valid node.
>>
>> To fix this, hugetlb_hstate_alloc_pages should be called even when
>> hugepages_setup going to invalid.
>
> Shouldn't we bail out if someone requests node-specific allocations but
> we are not running with NUMA?

I thought about this as well, and could not come up with a good answer.
Certainly, nobody SHOULD specify both 'numa=off' and ask for node specific
allocations on the same command line. I would have no problem bailing out
in such situations. But, I think that would also require the hugetlb command
line processing to look for such situations.

One could also argue that if there is only a single node (not numa=off on
command line) and someone specifies node local allocations we should bail.

I was 'thinking' about a situation where we had multiple nodes and node
local allocations were 'hard coded' via grub or something. Then, for some
reason one node fails to come up on a reboot. Should we bail on all the
hugetlb allocations, or should we try to allocate on the still available
nodes?

When I went back and reread the reason for this change, I see that it is
primarily for 'some debugging and test cases'.

>
> What's the result after your change?
>
>>
>> Cc: <stable@xxxxxxxxxxxxxxx>
>
> I am not sure if this is really stable material.

Right now, we partially and inconsistently process node specific allocations
if there are missing nodes. We allocate 'regular' hugetlb pages on existing
nodes. But, we do not allocate gigantic hugetlb pages on existing nodes.

I believe this is worth fixing in stable.

Since the behavior for missing nodes was not really spelled out when node
specific allocations were introduced, I think an acceptable stable fix could
be to bail.

In any case, I think we need to do something.

>
>> Fixes: b5389086ad7b ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
>> Signed-off-by: Peng Liu <liupeng256@xxxxxxxxxx>
>

--
Mike Kravetz