Re: [PATCH] mm/compaction: Fix the incorrect hole in fast_isolate_freepages()
From: Mike Rapoport
Date: Thu May 21 2020 - 09:38:42 EST
On Thu, May 21, 2020 at 10:36:06AM +0100, Mel Gorman wrote:
> On Thu, May 21, 2020 at 09:44:07AM +0800, Baoquan He wrote:
> > After investigation, it turns out that this is introduced by commit of
> > linux-next: commit f6edbdb71877 ("mm: memmap_init: iterate over memblock
> > regions rather that check each PFN").
> >
> > After investigation, it turns out that this is introduced by commit of
> > linux-next, the patch subject is:
> > "mm: memmap_init: iterate over memblock regions rather that check each PFN".
> >
>
> Some repetition here. I assume it's because the commit ID is not stable
> because it's in linux-next.
>
> > Qian added debugging code. The debugging log shows that the fault page is
> > 0x2a800000. From the system e820 map which is pasted at bottom, the page
> > is in e820 reserved range:
> > BIOS-e820: [mem 0x0000000029ffe000-0x000000002a80afff] reserved
> > And it's in section [0x28000000, 0x2fffffff]. In that secion, there are
> > several usable ranges and some e820 reserved ranges.
> >
> > For this kind of e820 reserved range, it won't be added to memblock allocator.
> > However, init_unavailable_mem() will initialize to add them into node 0,
> > zone 0.
>
> Why is it appropriate for init_unavailable_mem to add a e820 reserved
> range to the zone at all? The bug being triggered indicates there is a
> mismatch between the zone of a struct page and the PFN passed in.
>
> > Before that commit, later, memmap_init() will add e820 reserved
> > ranges into the zone where they are contained, because it can pass
> > the checking of early_pfn_valid() and early_pfn_in_nid(). In this case,
> > the e820 reserved range where fault page 0x2a800000 is located is added
> > into DMA32 zone. After that commit, the e820 reserved rgions are kept
> > in node 0, zone 0, since we iterate over memblock regions to iniatialize
> > in memmap_init() instead, their node and zone won't be changed.
> >
>
> This implies that we have struct pages that should never be used (e820
> reserved) but exist somehow in a zone range but with broken linkages to
> their node and zone.
Unless I misread Baoquan's explanation we do have struct pages for e820
resrved memory, but they are properly linked to their node and zone and
marked as reserved.
Before the commit memmap_init_zone() looped over all pfns in node in
zone and the reserved pages were passed to __init_single_page with node
and zone where their pfn belongs.
Afterwards, free_low_memory_core_early() reserved them because they were
present in memblock.reserved and form here on nothing would touch them.
After the commit, we skip over the holes in memblock.memory during
memmap_init_zone() which leaves the zone and node link zeroed for e820
reserved pages and that evetually triggers
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn)).
> > Reported-by: Qian Cai <cai@xxxxxx>
> > Signed-off-by: Baoquan He <bhe@xxxxxxxxxx>
> > ---
> > mm/compaction.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index 67fd317f78db..9ce4cff4d407 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -1418,7 +1418,9 @@ fast_isolate_freepages(struct compact_control *cc)
> > cc->free_pfn = highest;
> > } else {
> > if (cc->direct_compaction && pfn_valid(min_pfn)) {
> > - page = pfn_to_page(min_pfn);
> > + page = pageblock_pfn_to_page(min_pfn,
> > + pageblock_end_pfn(min_pfn),
> > + cc->zone);
> > cc->free_pfn = min_pfn;
> > }
> > }
>
> Why is the correct fix not to avoid creating struct pages for e820
> ranges and make sure that struct pages that are reachable have proper
> node and zone linkages?
I think that simpler fix would be to have e820 reserved ranges
reigstered as memory in memblock and not only as reserved. If we will
have these regions in both memblock.memory and memblock.reserved the
struct pages for them will be properly initialized as reserved.
> --
> Mel Gorman
> SUSE Labs
--
Sincerely yours,
Mike.