Re: [PATCH v4 1/5] mm,memory_hotplug: Allocate memmap from the added memory range

From: Oscar Salvador
Date: Mon Mar 15 2021 - 06:23:17 EST


On Thu, Mar 11, 2021 at 08:06:53PM +0100, David Hildenbrand wrote:
> This looks essentially good to me, except some parts in
> mhp_supports_memmap_on_memory()
>
> > +bool mhp_supports_memmap_on_memory(unsigned long size)
> > +{
> > + unsigned long pageblock_size = PFN_PHYS(pageblock_nr_pages);
> > + unsigned long remaining_mem = size - PMD_SIZE;

Hi David, thanks for the review!

> This looks weird. I think what you want to test is that
>
>
> a) "nr_vmemmap_pages * sizeof(struct page)" spans complete PMDs (IOW, we
> won't map too much via the altmap when populating a PMD in the vmemmap)
>
> b) "remaining = size - nr_vmemmap_pages * sizeof(struct page)" spans
> complete pageblock.

We do not know the nr_vmemmap_pages at this point in time, although it is
easy to pre-compute.

For every section we populate, we use PMD_SIZE. So, PMD_SIZE/PAGE_SIZE lays
the nr_vmemmap_pages that are used for populating a single section.

But let me explain the reasoning I use in the current code:

I will enumarate the assumptions that must hold true in order to support the
feature together with their check:

- We span a single memory block

size == memory_block_size_bytes()

- The vmemmap pages span a complete PMD and no more than a PMD.

!(PMD_SIZE % sizeof(struct page))

- The vmemmap pages and the pages exposed to the buddy have to cover full
pageblocks

remaining_mem = size - PMD_SIZE;
IS_ALIGNED(remaining_mem, pageblock_size)

Although this check only covers the range without the vmemmap pages, one could
argue that since we use only a PMD_SIZE at a time, we know that PMD_SIZE is
pageblock aligned, so the vmemmap range is PMD_SIZE as well.

Now, I see how this might be confusing and rather incomplete.
So I guess a better and more clear way to write it would be:

bool mhp_supports_memmap_on_memory(unsigned long size)
{
unsigned long nr_vmemmap_pages = PMD_SIZE / PAGE_SIZE;
unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
unsigned long remaining_size = size - vmemmap_size;

return memmap_on_memory &&
IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) &&
size == memory_block_size_bytes() &&
!(PMD_SIZE % vmemmap_size) &&
IS_ALIGNED(vmemmap_size, pageblock_size) &&
remaining_size &&
IS_ALIGNED(remaining_size, pageblock_size);
}

Note that above check is only for a single section, but if assumptions hold true
for a single section, it will for many as well.
We could be orthodox and do:

bool mhp_supports_memmap_on_memory(unsigned long size)
{
unsigned long nr_sections = (1ULL << SECTION_SHIFT) / memory_block_size_bytes;
unsigned long nr_vmemmap_pages = (PMD_SIZE / PAGE_SIZE) * nr_sections;
unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
unsigned long remaining_size = size - vmemmap_size;

return memmap_on_memory &&
IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) &&
size == memory_block_size_bytes() &&
!(PMD_SIZE % vmemmap_size) &&
IS_ALIGNED(vmemmap_size, pageblock_size) &&
remaining_size &&
IS_ALIGNED(remaining_size, pageblock_size);
}

to check for all sections, but I do not think it is necessary.

What do you think?

> I suggest a restructuring, compressing the information like:
>
> "
> Besides having arch support and the feature enabled at runtime, we need a
> few more assumptions to hold true:
>
> a) We span a single memory block: memory onlining/offlining happens in
> memory block granularity. We don't want the vmemmap of online memory blocks
> to reside on offline memory blocks. In the future, we might want to support
> variable-sized memory blocks to make the feature more versatile.
>
> b) The vmemmap pages span complete PMDs: We don't want vmemmap code to
> populate memory from the altmap for unrelated parts (i.e., other memory
> blocks).
>
> c) The vmemmap pages (and thereby the pages that will be exposed to the
> buddy) have to cover full pageblocks: memory onlining/offlining code
> requires applicable ranges to be page-aligned, for example, to set the
> migratetypes properly.
> "

I am fine with the above, I already added it, thanks.

> Do we have to special case / protect against the vmemmap optimization for
> hugetlb pages? Or is that already blocked somehow and I missed it?

Yes, hugetlb-vmemmap feature disables vmemmap on PMD population [1]

[1] https://patchwork.kernel.org/project/linux-mm/patch/20210308102807.59745-7-songmuchun@xxxxxxxxxxxxx/


--
Oscar Salvador
SUSE L3