Re: [PATCH] mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC

From: Michal Hocko
Date: Tue Feb 26 2019 - 13:20:11 EST


On Tue 26-02-19 19:16:48, Michal Hocko wrote:
> On Tue 26-02-19 12:53:05, Qian Cai wrote:
> > On Tue, 2019-02-26 at 15:23 +0100, Michal Hocko wrote:
> > > On Tue 26-02-19 09:16:30, Qian Cai wrote:
> > > >
> > > >
> > > > On 2/26/19 7:35 AM, Michal Hocko wrote:
> > > > > On Mon 25-02-19 14:17:10, Qian Cai wrote:
> > > > > > When onlining memory pages, it calls kernel_unmap_linear_page(),
> > > > > > However, it does not call kernel_map_linear_page() while offlining
> > > > > > memory pages. As the result, it triggers a panic below while onlining on
> > > > > > ppc64le as it checks if the pages are mapped before unmapping,
> > > > > > Therefore, let it call kernel_map_linear_page() when setting all pages
> > > > > > as reserved.
> > > > >
> > > > > This really begs for much more explanation. All the pages should be
> > > > > unmapped as they get freed AFAIR. So why do we need a special handing
> > > > > here when this path only offlines free pages?
> > > > >
> > > >
> > > > It sounds like this is exact the point to explain the imbalance. When
> > > > offlining,
> > > > every page has already been unmapped and marked reserved. When onlining, it
> > > > tries to free those reserved pages via __online_page_free(). Since those
> > > > pages
> > > > are order 0, it goes free_unref_page() which in-turn call
> > > > kernel_unmap_linear_page() again without been mapped first.
> > >
> > > How is this any different from an initial page being freed to the
> > > allocator during the boot?
> > >
> >
> > As least for IBM POWER8, it does this during the boot,
> >
> > early_setup
> > early_init_mmu
> > harsh__early_init_mmu
> > htab_initialize [1]
> > htab_bolt_mapping [2]
> >
> > where it effectively map all memblock regions just like
> > kernel_map_linear_page(), so later mem_init() -> memblock_free_all() will unmap
> > them just fine.
> >
> > [1]
> > for_each_memblock(memory, reg) {
> > base = (unsigned long)__va(reg->base);
> > size = reg->size;
> >
> > DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
> > base, size, prot);
> >
> > BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
> > prot, mmu_linear_psize, mmu_kernel_ssize));
> > }
> >
> > [2] linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;
>
> Thanks for the clarification. I would have expected that there is a
> generic path to do kernel_map_pages from an appropriate place. I am also
> wondering whether blowing up is actually the right thing to do. Is the
> ppc specific code correct? Isn't your patch simply working around a
> bogus condition?

Btw. what happens if the offlined pfn range is removed completely? Is
the range still mapped? What kind of consequences does this have?
Also when does this tweak happens on a completely new hotplugged memory
range?
--
Michal Hocko
SUSE Labs