Re: [PATCH 1/1] mm: compaction: avoid fast_isolate_around() to set pageblock_skip on reserved pages
From: Mike Rapoport
Date: Wed Dec 02 2020 - 12:40:52 EST
Hello Andrea,
On Tue, Dec 01, 2020 at 07:44:51PM -0500, Andrea Arcangeli wrote:
> Hello Mike,
>
> On Sun, Nov 29, 2020 at 02:32:57PM +0200, Mike Rapoport wrote:
> > Hello Andrea,
> >
> > On Thu, Nov 26, 2020 at 07:46:05PM +0200, Mike Rapoport wrote:
> > > On Thu, Nov 26, 2020 at 11:05:14AM +0100, David Hildenbrand wrote:
> > >
> > > Let's try to merge init_unavailable_memory() into memmap_init().
> > > Than it'll be able to set zone/nid for those nasty pfns that BIOS
> > > decided to keep to itself, like in Andrea's case and will also take care
> > > of struct pages that do not really have a frame in DRAM, but are there
> > > because of arbitrary section size.
> >
> > Did you have a chance to check if this solves your issue?
> > If yes, I'll resend this as a formal patch.
>
> I tested the patch you sent, but it doesn't seem to boot. Reverting it
> booted.
Hmm, do you have any logs? It worked on my box and with various memory
configurations in qemu.
> Also I noticed leaving pages uninitialized already happened before and
> was fixed in 124049decbb121ec32742c94fb5d9d6bed8f24d8 where literally
> all holes were registered in memblock_reserve() by hand to workaround
> this very same defect in memblock callee we're fixing again.
>
> Then it was lifted in 9fd61bc95130d4971568b89c9548b5e0a4e18e0e since
> the memblock was supposed to be able to initialize all pages.
I still that think both 124049decbb1 ("x86/e820: put !E820_TYPE_RAM
regions into memblock.reserved") and 9fd61bc95130 ("Revert "x86/e820:
put !E820_TYPE_RAM regions into memblock.reserved") are band aids and
they do not address the root cause but rather try to workaround it.
The problem is that we have no proper way to initialize struct pages
corresponding to holes. The holes can appear in case when the memory is
not a multiple of SECTION_SIZE and, at least on x86, in case BIOS
reserves pages and they are not E820_TYPE_RAM.
The first case is not much of a problem, as I doubt there are still
systems that have not MAX_ORDER aligned memory banks, so the "trailing"
pages in semi-empty section will never be used by the page allocator.
The BIOS case, however is more problematic as it creates holes inside
MAX_ORDER aligned pageblocks.
Pushing the pages in such holes in and out of memblock.reserved won't
really help because they are not in memblock.memory while most
calculations during initialization of page allocator and memory map use
memblock.memory and iterate over it.
As these holes do not seem to appear on zone/node boundaries, we do not
see reports about wrong zone/node sizing because of the holes.
I believe that the proper solution would be to have all the memory
reserved by the firmware added to memblock.memory, like all other
architectures do, but I'm not sure if we can easily and reliably
determine what of E820 reserved types are actually backed by RAM.
So this is not feasible as a short term solution.
My patch [1], though, is an immediate improvement and I think it's worth
trying to fix off-by-one's that prevent your system from booting.
[1] https://lore.kernel.org/lkml/20201201181502.2340-1-rppt@xxxxxxxxxx
> Since this seems the second time this happens, I'd suggest to change
> the graceful pfn, 0,0 initialization to memset(page, 0xff,
> sizeof(struct page)) too like we mentioned earlier and then have at
> least a DEBUG_SOMETHING=y to search for struct pages with all 1 in
> page->flags to ensure the boot stage worked right so perhaps there's a
> graceful notification at boot before a VM_BUG_ON triggers later. The
> page struct validation could be done based on DEBUG_VM=y too since it
> won't cause any runtime cost post boot.
>
> Thanks,
> Andrea
>
--
Sincerely yours,
Mike.