Re: [PATCH V2 1/2] arm64/mm: Fix pfn_valid() for ZONE_DEVICE based memory

From: David Hildenbrand
Date: Thu Feb 11 2021 - 08:02:09 EST


On 11.02.21 13:10, Anshuman Khandual wrote:


On 2/11/21 5:23 PM, Will Deacon wrote:
On Fri, Feb 05, 2021 at 06:55:53PM +0000, Will Deacon wrote:
On Wed, Feb 03, 2021 at 09:20:39AM +0530, Anshuman Khandual wrote:
On 2/2/21 6:26 PM, David Hildenbrand wrote:
On 02.02.21 13:51, Will Deacon wrote:
On Tue, Feb 02, 2021 at 01:39:29PM +0100, David Hildenbrand wrote:
As I expressed already, long term we should really get rid of the arm64
variant and rather special-case the generic one. Then we won't go out of
sync - just as it happened with ZONE_DEVICE handling here.

Why does this have to be long term? This ZONE_DEVICE stuff could be the
carrot on the stick :)

Yes, I suggested to do it now, but Anshuman convinced me that doing a
simple fix upfront might be cleaner --- for example when it comes to
backporting :)

Right. The current pfn_valid() breaks for ZONE_DEVICE memory and this fixes
the problem in the present context which can be easily backported if required.

Changing or rather overhauling the generic code with new configs as proposed
earlier (which I am planning to work on subsequently) would definitely be an
improvement for the current pfn_valid() situation in terms of maintainability
but then it should not stop us from fixing the problem now.

Alright, I've mulled this over a bit. I don't agree that this patch helps
with maintainability (quite the opposite, in fact), but perfection is the
enemy of the good so I'll queue the series for 5.12. However, I'll revert
the changes at the first sign of a problem, so please do work towards a
generic solution which can replace this in the medium term.

... and dropped. These patches appear to be responsible for a boot
regression reported by CKI:

Ahh, boot regression ? These patches only change the behaviour
for non boot memory only.


https://lore.kernel.org/r/cki.8D1CB60FEC.K6NJMEFQPV@xxxxxxxxxx

Will look into the logs and see if there is something pointing to
the problem.


It's strange. One thing I can imagine is a mis-detection of early sections. However, I don't see that happening:

In sparse_init_nid(), we:
1. Initialize the memmap
2. Set SECTION_IS_EARLY | SECTION_HAS_MEM_MAP via
sparse_init_one_section()

Only hotplugged sections (DIMMs, dax/kmem) set SECTION_HAS_MEM_MAP without SECTION_IS_EARLY - which is correct, because these are not early.

So once we know that we have valid_section() -- SECTION_HAS_MEM_MAP is set -- early_section() should be correct.

Even if someone would be doing a pfn_valid() after memblocks_present()->memory_present() but before
sparse_init_nid(), we should be fine (!valid_section() -> return 0).


As it happens early during boot, I doubt that some NVDIMMs that get detected and added early during boot as system RAM (via dax/kmem). Are the problem.

--
Thanks,

David / dhildenb