Re: [PATCH V2 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

From: David Hildenbrand
Date: Wed Mar 03 2021 - 16:29:56 EST


On 03.03.21 20:04, Catalin Marinas wrote:
On Thu, Feb 11, 2021 at 01:35:56PM +0100, David Hildenbrand wrote:
On 11.02.21 13:10, Anshuman Khandual wrote:
On 2/11/21 5:23 PM, Will Deacon wrote:
... and dropped. These patches appear to be responsible for a boot
regression reported by CKI:

Ahh, boot regression ? These patches only change the behaviour
for non boot memory only.

https://lore.kernel.org/r/cki.8D1CB60FEC.K6NJMEFQPV@xxxxxxxxxx

Will look into the logs and see if there is something pointing to
the problem.

It's strange. One thing I can imagine is a mis-detection of early sections.
However, I don't see that happening:

In sparse_init_nid(), we:
1. Initialize the memmap
2. Set SECTION_IS_EARLY | SECTION_HAS_MEM_MAP via
sparse_init_one_section()

Only hotplugged sections (DIMMs, dax/kmem) set SECTION_HAS_MEM_MAP without
SECTION_IS_EARLY - which is correct, because these are not early.

So once we know that we have valid_section() -- SECTION_HAS_MEM_MAP is set
-- early_section() should be correct.

Even if someone would be doing a pfn_valid() after
memblocks_present()->memory_present() but before
sparse_init_nid(), we should be fine (!valid_section() -> return 0).

I couldn't figure out how this could fail with Anshuman's patches.
Will's suspicion is that some invalid/null pointer gets dereferenced
before being initialised but the only case I see is somewhere in
pfn_section_valid() (ms->usage) if valid_section() && !early_section().

Indeed, it looks like a latent bug.


Assuming that we do get a valid_section(ms) && !early_section(ms), is
there a case where ms->usage is not initialised? I guess races with
section_deactivate() are not possible this early.


Do you have access to that machine? We could identify which path is taken quite easily.

Another situation could be that pfn_valid() returns true when no memory
is mapped for that pfn.

As it happens early during boot, I doubt that some NVDIMMs that get detected
and added early during boot as system RAM (via dax/kmem) are the problem.

It is indeed very early, we can't even get the early console output.

So even before any hotplug really happens. All sections should be early at that point I guess.

Debugging this is even harder as it's only misbehaving on a board we
don't have access to.

On the logic in this patch, is the hot-added memory always covering a
full subsection? The arm64 pfn_valid() currently relies on
memblock_is_map_memory() but the patch changes it to
pfn_section_valid(). So if hot-added memory doesn't cover the full
subsection, it may return true even if the pfn is not mapped.

Hotplugged System RAM always covers full sections. Hotplugged ZONE_DEVICE always covers full subsections. pfn_section_valid() properly handles both cases. (see generic pfn_valid())

--
Thanks,

David / dhildenb