Re: [PATCH 2/2] hugepage: Allow parallelization of the hugepagefault path

From: Mel Gorman
Date: Thu Jul 21 2011 - 06:17:16 EST


On Fri, Jul 15, 2011 at 04:06:50PM +1000, Anton Blanchard wrote:
>
> Hi Mel,
>
> > I haven't tested this patch yet but typically how I would test it is
> > multiple parallel instances of make func from libhugetlbfs. In
> > particular I would be looking out for counter corruption. Has
> > something like this been done? I know hugetlb_lock protects the
> > counters but the locking in there has turned into a bit of a mess so
> > it's easy to miss something.
>
> Thanks for the suggestion and sorry for taking so long. Make check has
> the same PASS/FAIL count before and after the patches.
>
> I also ran 16 copies of make func on a large box with 896 HW threads.
> Some of the tests that use shared memory were a bit upset, but that
> seems to be because we use a static key. It seems the tests were also
> fighting over the number of huge pages they wanted the system set to.
>
> It got up to a load average of 13207, and heap-overflow consumed all my
> memory, a pretty good effort considering I have over 1TB of it.
>
> After things settled down things were OK, apart from the fact that we
> have 20 huge pages unaccounted for:
>
> HugePages_Total: 10000
> HugePages_Free: 9980
> HugePages_Rsvd: 0
> HugePages_Surp: 0
>
> I verified there were no shared memory segments, and no files in the
> hugetlbfs filesystem (I double checked by unmounting it).
>
> I can't see how this patch set would cause this. It seems like we can
> leak huge pages, perhaps in an error path. Anyway, I'll repost the
> patch set for comments.
>

I didn't see any problem with the patch either. The locking should
be functionally equivalent for both private and shared mappings.

Out of curiousity, can you trigger the bug without the patches? It
could be a race on faulting the shared regions that is causing the
leakage. Any chance this can be debugged minimally by finding out if
this is an accounting bug or if if a hugepage is leaked? Considering
the stress of the machine and its size, I'm guessing it's not trivially
reproducible anywhere else.

I think one possibility of where the bug is is when updating
the hugepage pool size with "make func". This is a scenario that does
not normally occur as hugepage pool resizing is an administrative task
that happens rarely. See this chunk for instance

spin_lock(&hugetlb_lock);
if (h->surplus_huge_pages >= h->nr_overcommit_huge_pages) {
spin_unlock(&hugetlb_lock);
return NULL;
} else {
h->nr_huge_pages++;
h->surplus_huge_pages++;
}
spin_unlock(&hugetlb_lock);

page = alloc_pages(htlb_alloc_mask|__GFP_COMP|
__GFP_REPEAT|__GFP_NOWARN,
huge_page_order(h));

if (page && arch_prepare_hugepage(page)) {
__free_pages(page, huge_page_order(h));
return NULL;
}

That thing is not updating the counters if arch_prepare_hugepage fails.
That function is a no-op on powerpc normally. That wasn't changed for
any reason was it?

Another possibility is a regression of
[4f16fc10: mm: hugetlb: fix hugepage memory leak in mincore()] so maybe
try a similar reproduction case of mincore?

Maybe also try putting assert_spin_locked at every point nr_free_pages
or nr_huge_pages is updated and see if one of them triggers?


--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/