Re: [RFC PATCH v2 11/19] mm/sparsemem: Use alloc_table() for table allocations

From: Vlastimil Babka
Date: Thu Sep 02 2021 - 09:57:04 EST


On 9/1/21 09:22, Mike Rapoport wrote:
> On Tue, Aug 31, 2021 at 06:25:23PM +0000, Edgecombe, Rick P wrote:
>> On Tue, 2021-08-31 at 11:55 +0300, Mike Rapoport wrote:
>> > On Mon, Aug 30, 2021 at 04:59:19PM -0700, Rick Edgecombe wrote:
>> <trim>
>> > > -static void * __meminit vmemmap_alloc_block_zero(unsigned long
>> > > size, int node)
>> > > +static void * __meminit vmemmap_alloc_table(int node)
>> > > {
>> > > - void *p = vmemmap_alloc_block(size, node);
>> > > + void *p;
>> > > + if (slab_is_available()) {
>> > > + struct page *page = alloc_table_node(GFP_KERNEL |
>> > > __GFP_ZERO, node);
>> >
>> > This change removes __GFP_RETRY_MAYFAIL|__GFP_NOWARN from the
>> > original gfp
>> > vmemmap_alloc_block() used.
>> Oh, yea good point. Hmm, I guess grouped pages could be aware of that
>> flag too. Would be a small addition, but it starts to grow
>> unfortunately.
>>
>> > Not sure __GFP_RETRY_MAYFAIL is really needed in
>> > vmemmap_alloc_block_zero()
>> > at the first place, though.
>> Looks like due to a real issue:
>> 055e4fd96e95b0eee0d92fd54a26be7f0d3bcad0

That commit added __GFP_REPEAT, but __GFP_RETRY_MAYFAIL these days became
subtly different.

> I believe the issue was with memory map blocks rather than with page
> tables, but since sparse-vmemmap uses the same vmemmap_alloc_block() for
> both, the GFP flag got stick with both.
>
> I'm not really familiar with reclaim internals to say if
> __GFP_RETRY_MAYFAIL would help much for order-0 allocation.

For costly allocation, __GFP_RETRY_MAYFAIL will try harder, thus the RETRY
part is accented. For order-0 the only difference is that it will skip OOM,
thus the MAYFAIL part. It usually means there's a fallback. I guess in this
case there's no fallback, so allocating without __GFP_RETRY_MAYFAIL would be
better.

> Vlastimil, can you comment on this?
>
>> I think it should not affect PKS tables for now, so maybe I can make
>> separate logic instead. I'll look into it. Thanks.
>> >
>> > More broadly, maybe it makes sense to split boot time and memory
>> > hotplug
>> > paths and use pxd_alloc() for the latter.
>> >
>> > > +
>> > > + if (!page)
>> > > + return NULL;
>> > > + return page_address(page);
>> > > + }
>> > >
>> > > + p = __earlyonly_bootmem_alloc(node, PAGE_SIZE, PAGE_SIZE,
>> > > __pa(MAX_DMA_ADDRESS));
>> >
>> > Opportunistically rename to __earlyonly_memblock_alloc()? ;-)
>> >
>> Heh, I can. Just grepping, there are several other instances of
>> foo_bootmem() only calling foo_memblock() pattern scattered about. Or
>> maybe I'm missing the distinction.
>
> Heh, I didn't do s/bootmem/memblock/g, so foo_bootmem() are reminders we
> had bootmem allocator once.
> Maybe it's a good time to remove them :)
>
>> <trim>
>