[PATCH 0/1] memory offline issues with hugepage size > memory block size

From: Gerald Schaefer
Date: Tue Sep 20 2016 - 11:54:29 EST

dissolve_free_huge_pages() will either run into the VM_BUG_ON() or a
list corruption and addressing exception when trying to set a memory
block offline that is part (but not the first part) of a gigantic
hugetlb page with a size > memory block size.

When no other smaller hugepage sizes are present, the VM_BUG_ON() will
trigger directly. In the other case we will run into an addressing
exception later, because dissolve_free_huge_page() will not use the head
page of the compound hugetlb page which will result in a NULL hstate
from page_hstate(). list_del() would also not work well on a tail page.

To fix this, first remove the VM_BUG_ON() because it is wrong, and then
use the compound head page in dissolve_free_huge_page().

However, this all assumes that it is the desired behaviour to remove
a (gigantic) unused hugetlb page from the pool, just because a small
(in relation to the hugepage size) memory block is going offline. Not
sure if this is the right thing, and it doesn't look very consistent
given that in this scenario it is _not_ possible to migrate
such a (gigantic) hugepage if it is in use. OTOH, has_unmovable_pages()
will return false in both cases, i.e. the memory block will be reported
as removable, no matter if the hugepage that it is part of is unused or
in use.

This patch is assuming that it would be OK to remove the hugepage,
i.e. memory offline beats pre-allocated unused (gigantic) hugepages.

Any thoughts?

Gerald Schaefer (1):
mm/hugetlb: fix memory offline with hugepage size > memory block size

mm/hugetlb.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)