Re: [PATCH] mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC

From: Michal Hocko
Date: Tue Feb 26 2019 - 13:16:53 EST


On Tue 26-02-19 12:53:05, Qian Cai wrote:
> On Tue, 2019-02-26 at 15:23 +0100, Michal Hocko wrote:
> > On Tue 26-02-19 09:16:30, Qian Cai wrote:
> > >
> > >
> > > On 2/26/19 7:35 AM, Michal Hocko wrote:
> > > > On Mon 25-02-19 14:17:10, Qian Cai wrote:
> > > > > When onlining memory pages, it calls kernel_unmap_linear_page(),
> > > > > However, it does not call kernel_map_linear_page() while offlining
> > > > > memory pages. As the result, it triggers a panic below while onlining on
> > > > > ppc64le as it checks if the pages are mapped before unmapping,
> > > > > Therefore, let it call kernel_map_linear_page() when setting all pages
> > > > > as reserved.
> > > >
> > > > This really begs for much more explanation. All the pages should be
> > > > unmapped as they get freed AFAIR. So why do we need a special handing
> > > > here when this path only offlines free pages?
> > > >
> > >
> > > It sounds like this is exact the point to explain the imbalance. When
> > > offlining,
> > > every page has already been unmapped and marked reserved. When onlining, it
> > > tries to free those reserved pages via __online_page_free(). Since those
> > > pages
> > > are order 0, it goes free_unref_page() which in-turn call
> > > kernel_unmap_linear_page() again without been mapped first.
> >
> > How is this any different from an initial page being freed to the
> > allocator during the boot?
> >
>
> As least for IBM POWER8, it does this during the boot,
>
> early_setup
> early_init_mmu
> harsh__early_init_mmu
> htab_initialize [1]
> htab_bolt_mapping [2]
>
> where it effectively map all memblock regions just like
> kernel_map_linear_page(), so later mem_init() -> memblock_free_all() will unmap
> them just fine.
>
> [1]
> for_each_memblock(memory, reg) {
> base = (unsigned long)__va(reg->base);
> size = reg->size;
>
> DBG("creating mapping for region: %lx..%lx (prot: %lx)\n",
> base, size, prot);
>
> BUG_ON(htab_bolt_mapping(base, base + size, __pa(base),
> prot, mmu_linear_psize, mmu_kernel_ssize));
> }
>
> [2] linear_map_hash_slots[paddr >> PAGE_SHIFT] = ret | 0x80;

Thanks for the clarification. I would have expected that there is a
generic path to do kernel_map_pages from an appropriate place. I am also
wondering whether blowing up is actually the right thing to do. Is the
ppc specific code correct? Isn't your patch simply working around a
bogus condition?

--
Michal Hocko
SUSE Labs