Re: [PATCH 0/1] memory offline issues with hugepage size > memory block size

From: Michal Hocko
Date: Wed Sep 21 2016 - 14:21:02 EST

On Tue 20-09-16 10:37:04, Mike Kravetz wrote:
> On 09/20/2016 08:53 AM, Gerald Schaefer wrote:
> > dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
> > list corruption and addressing exception when trying to set a memory
> > block offline that is part (but not the first part) of a gigantic
> > hugetlb page with a size > memory block size.
> >
> > When no other smaller hugepage sizes are present, the VM_BUG_ON() will
> > trigger directly. In the other case we will run into an addressing
> > exception later, because dissolve_free_huge_page() will not use the head
> > page of the compound hugetlb page which will result in a NULL hstate
> > from page_hstate(). list_del() would also not work well on a tail page.
> >
> > To fix this, first remove the VM_BUG_ON() because it is wrong, and then
> > use the compound head page in dissolve_free_huge_page().
> >
> > However, this all assumes that it is the desired behaviour to remove
> > a (gigantic) unused hugetlb page from the pool, just because a small
> > (in relation to the hugepage size) memory block is going offline. Not
> > sure if this is the right thing, and it doesn't look very consistent
> > given that in this scenario it is _not_ possible to migrate
> > such a (gigantic) hugepage if it is in use. OTOH, has_unmovable_pages()
> > will return false in both cases, i.e. the memory block will be reported
> > as removable, no matter if the hugepage that it is part of is unused or
> > in use.
> >
> > This patch is assuming that it would be OK to remove the hugepage,
> > i.e. memory offline beats pre-allocated unused (gigantic) hugepages.
> >
> > Any thoughts?
> Cc'ed Rui Teng and Dave Hansen as they were discussing the issue in
> this thread:
> Their approach (I believe) would be to fail the offline operation in
> this case. However, I could argue that failing the operation, or
> dissolving the unused huge page containing the area to be offlined is
> the right thing to do.

I am sorry I have noticed this thread only now. I was arguing about this
in the original thread. I would be rather reluctant to free gigantic
page just because somebody wants to offline a small part of it because
setup is really expensive and a lost page would be really hard to get

I would even question the per page block offlining itself. Why would
anybody want to offline few blocks rather than the whole node? What is
the usecase here?
Michal Hocko