Re: [PATCH v1] mm: inititalize struct pages when adding a section

From: Pavel Tatashin
Date: Mon Jul 30 2018 - 09:30:58 EST


On Mon, Jul 30, 2018 at 8:11 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 30.07.2018 14:05, Michal Hocko wrote:
> > On Mon 30-07-18 13:53:06, David Hildenbrand wrote:
> >> On 30.07.2018 13:30, Michal Hocko wrote:
> >>> On Fri 27-07-18 18:54:54, David Hildenbrand wrote:
> >>>> Right now, struct pages are inititalized when memory is onlined, not
> >>>> when it is added (since commit d0dc12e86b31 ("mm/memory_hotplug: optimize
> >>>> memory hotplug")).
> >>>>
> >>>> remove_memory() will call arch_remove_memory(). Here, we usually access
> >>>> the struct page to get the zone of the pages.
> >>>>
> >>>> So effectively, we access stale struct pages in case we remove memory that
> >>>> was never onlined. So let's simply inititalize them earlier, when the
> >>>> memory is added. We only have to take care of updating the zone once we
> >>>> know it. We can use a dummy zone for that purpose.
> >>>
> >>> I have considered something like this when I was reworking memory
> >>> hotplug to not associate struct pages with zone before onlining and I
> >>> considered this to be rather fragile. I would really not like to get
> >>> back to that again if possible.
> >>>
> >>>> So effectively, all pages will already be initialized and set to
> >>>> reserved after memory was added but before it was onlined (and even the
> >>>> memblock is added). We only inititalize pages once, to not degrade
> >>>> performance.
> >>>
> >>> To be honest, I would rather see d0dc12e86b31 reverted. It is late in
> >>> the release cycle and if the patch is buggy then it should be reverted
> >>> rather than worked around. I found the optimization not really
> >>> convincing back then and this is still the case TBH.
> >>>
> >>
> >> If I am not wrong, that's already broken in 4.17, no? What about that?
> >
> > Ohh, I thought this was merged in 4.18.
> > $ git describe --contains d0dc12e86b31 --match="v*"
> > v4.17-rc1~99^2~44
> >
> > proves me wrong. This means that the fix is not so urgent as I thought.
> > If you can figure out a reasonable fix then it should be preferable to
> > the revert.
> >
> > Fake zone sounds too hackish to me though.
> >
>
> If I am not wrong, that's the same we had before d0dc12e86b31 but now it
> is explicit and only one single value for all kernel configs
> ("ZONE_NORMAL").
>
> Before d0dc12e86b31, struct pages were initialized to 0. So it was
> (depending on the config) ZONE_DMA, ZONE_DMA32 or ZONE_NORMAL.
>
> Now the value is random and might not even be a valid zone.

Hi David,

Have you figured out why we access struct pages during hot-unplug for
offlined memory? Also, a panic trace would be useful in the patch.

As I understand the bug may occur only when hotremove is enabled, and
default onlining of added memory is disabled. Is this correct? I
suspect the reason we have not heard about this bug is that it is rare
to add memory and not to online it.

Thank you,
Pavel