Re: pipe/page fault oddness.

From: Linus Torvalds
Date: Wed Oct 01 2014 - 18:43:00 EST


On Wed, Oct 1, 2014 at 3:08 PM, Sasha Levin <sasha.levin@xxxxxxxxxx> wrote:
>
> I've tried this patch on the same configuration that was triggering
> the VM_BUG_ON that Hugh mentioned previously. Surprisingly enough it
> ran fine for ~20 minutes before exploding with:

Well, that's somewhat encouraging. I didn't expect it to be perfect.

That said, "ran fine" isn't necessarily the same thing as "worked".
Who knows how buggy it was without showing overt symptoms until the
BUG_ON() triggered. But hey, I'll be optimistic.

> [ 2781.566206] kernel BUG at mm/huge_memory.c:1293!

So that's

BUG_ON(is_huge_zero_page(page));

and the reason is trivial: the old code used to have a magical special
case for the zero-page hugepage (see change_huge_pmd()) and I got rid
of that (because now it's just about setting protections, and the
zero-page hugepage is in no way special.

So I think the solution is equally trivial: just accept that the
zero-page can happen, and ignore it (just un-numa it).

Appended is a incremental diff on top of the previous one. Even less
tested than the last case, but I think you get the idea if it doesn't
work as-is.

Linus
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 14de54af6c38..fc33952d59c4 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1290,7 +1290,9 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
}

page = pmd_page(pmd);
- BUG_ON(is_huge_zero_page(page));
+ if (is_huge_zero_page(page))
+ goto huge_zero_page;
+
page_nid = page_to_nid(page);
last_cpupid = page_cpupid_last(page);
count_vm_numa_event(NUMA_HINT_FAULTS);
@@ -1381,6 +1383,11 @@ out:
task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR, flags);

return 0;
+huge_zero_page:
+ pmd = pmd_modify(pmd, vma->vm_page_prot);
+ set_pmd_at(mm, haddr, pmdp, pmd);
+ update_mmu_cache_pmd(vma, addr, pmdp);
+ goto out_unlock;
}

int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,