Re: arm32: panic in move_freepages (Was [PATCH v2 0/4] arm64: drop pfn_valid_within() and simplify pfn_valid())

From: Mike Rapoport
Date: Mon May 03 2021 - 04:44:29 EST


On Mon, May 03, 2021 at 10:07:01AM +0200, David Hildenbrand wrote:
> On 03.05.21 08:26, Mike Rapoport wrote:
> > On Fri, Apr 30, 2021 at 07:24:37PM +0800, Kefeng Wang wrote:
> > >
> > >
> > > On 2021/4/30 17:51, Mike Rapoport wrote:
> > > > On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote:
> > > > >
> > > > > On 2021/4/29 14:57, Mike Rapoport wrote:
> > > > >
> > > > > > > > Do you use SPARSMEM? If yes, what is your section size?
> > > > > > > > What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration?
> > > > > > > Yes,
> > > > > > >
> > > > > > > CONFIG_SPARSEMEM=y
> > > > > > >
> > > > > > > CONFIG_SPARSEMEM_STATIC=y
> > > > > > >
> > > > > > > CONFIG_FORCE_MAX_ZONEORDER = 11
> > > > > > >
> > > > > > > CONFIG_PAGE_OFFSET=0xC0000000
> > > > > > > CONFIG_HAVE_ARCH_PFN_VALID=y
> > > > > > > CONFIG_HIGHMEM=y
> > > > > > > #define SECTION_SIZE_BITS 26
> > > > > > > #define MAX_PHYSADDR_BITS 32
> > > > > > > #define MAX_PHYSMEM_BITS 32
> > > > >
> > > > >
> > > > > With the patch,  the addr is aligned, but the panic still occurred,
> > > >
> > > > Is this the same panic at move_freepages() for range [de600, de7ff]?
> > > >
> > > > Do you enable CONFIG_ARM_LPAE?
> > >
> > > no, the CONFIG_ARM_LPAE is not set, and yes with same panic at
> > > move_freepages at
> > >
> > > start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] : pfn =de600, page
> > > =ef3cc000, page-flags = ffffffff, pfn2phy = de600000
> > >
> > > > > __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200
> > > > > __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200
> > > > > __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200
> >
> > Hmm, [de600, de7ff] is not added to the free lists which is correct. But
> > then it's unclear how the page for de600 gets to move_freepages()...
> >
> > Can't say I have any bright ideas to try here...
>
> Are we missing some checks (e.g., PageReserved()) that pfn_valid_within()
> would have "caught" before?

Unless I'm missing something the crash happens in __rmqueue_fallback():

do_steal:
page = get_page_from_free_area(area, fallback_mt);

steal_suitable_fallback(zone, page, alloc_flags, start_migratetype,
can_steal);
-> move_freepages()
-> BUG()

So a page from free area should be sane as the freed range was never added
it to the free lists.

And honestly, with the memory layout reported elsewhere in the stack I'd
say that the bootloader/fdt beg for fixes...

--
Sincerely yours,
Mike.