On Mon, Mar 30, 2015 at 09:40:54AM +0000, Naoya Horiguchi wrote:
hugetlb doesn't support NUMA balancing now, but that doesn't mean that we
don't have to make hugetlb code prepared for PROTNONE entry properly.
In the current kernel, when a process accesses to hugetlb range protected
with PROTNONE, it causes unexpected COWs, which finally put hugetlb subsystem
into broken/uncontrollable state, where for example h->resv_huge_pages is
subtracted too much and wrapped around to a very large number, and free
hugepage pool is no longer maintainable.
Ouch!
This patch simply clears PROTNONE when it's caught out. Real NUMA balancing
code for hugetlb is not implemented yet (not sure how much it's worth doing.)
It's not worth doing at all. Furthermore, an application that took the
effort to allocate and use hugetlb pages is not going to appreciate the
minor faults incurred by automatic balancing for no gain.
Why not something
like the following untested patch?
It simply avoids doing protection updates
on hugetlb VMAs. If it works for you, feel free to take it and reuse most
of the same changelog for it. I'll only be intermittently online for the
next few days and would rather not unnecessarily delay a fix
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7ce18f3c097a..74bfde50fd4e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2161,8 +2161,10 @@ void task_numa_work(struct callback_head *work)
vma = mm->mmap;
}
for (; vma; vma = vma->vm_next) {
- if (!vma_migratable(vma) || !vma_policy_mof(vma))
+ if (!vma_migratable(vma) || !vma_policy_mof(vma) ||
+ is_vm_hugetlb_page(vma)) {
continue;
+ }
/*
* Shared library pages mapped by multiple processes are not
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>