Re: Process-shared futexes on hugepages puts the kernel in aninfinite loop in 2.6.32.11; is this fixed now?

From: Mel Gorman
Date: Mon Apr 19 2010 - 11:33:15 EST


On Mon, Apr 19, 2010 at 01:52:36PM +0200, Peter Zijlstra wrote:
> On Mon, 2010-04-19 at 12:43 +0100, Mel Gorman wrote:
>
> > > Right, so I had a quick chat with Mel, and it appears MAP_PRIVATE
> > > hugetlb pages don't have their page->mapping set.
> > >
> > > I guess something like the below might work, but I'd really rather not
> > > add hugetlb knowledge to futex.c. Does anybody else have a better idea?
> > > Maybe create something similar to an anon_vma for hugetlb pages?
> > >
> >
> > anon_vma for hugetlb pages sounds overkill, what would it gain? In this
> > context, futex only appears to distinguish between whether the
> > references are private or shared.
> >
> > Looking at the hugetlbfs code, I can't see a place where it actually cares
> > about the mapping as such. It's used to find shared pages in the page cache
> > (but not in the LRU) that are backed by the hugetlbfs file. For hugetlbfs
> > though, the mapping is mostly kept in page->private for reservation accounting
> > purposes.
> >
> > I can't think of other parts of the VM that touch the mapping if the
> > page is managed by hugetlbfs so the following patch should also work but
> > without futex having hugetlbfs-awareness. What do you think? Maybe for
> > safety, it would be better to make the mapping some obvious poison bytes
> > or'd with PAGE_MAPPING_ANON so an oops will be more obvious?
>
> Yes, this seems perfectly adequate to me, that poison idea might be
> worthwhile too :-)
>
> Acked-by: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
>

Are you ok with an Ack to this slightly-difference patch too? If so,
I'll forward it on to Andrew.

==== CUT HERE ====

Fix infinite loop in get_futex_key when backed by huge pages

If a futex key happens to be located within a huge page mapped MAP_PRIVATE,
get_futex_key() can go into an infinite loop waiting for a page->mapping
that will never exist. This was reported and documented in an external
bugzilla at

https://bugzilla.redhat.com/show_bug.cgi?id=552257

This patch makes page->mapping a poisoned value that includes PAGE_MAPPING_ANON
mapped MAP_PRIVATE. This is enough for futex to continue but because
of PAGE_MAPPING_ANON, the poisoned value is not dereferenced or used by
futex. No other part of the VM should be dereferencing the page->mapping of
a hugetlbfs page as its page cache is not on the LRU.

This patch fixes the problem with the test case described in the bugzilla.

Signed-off-by: Mel Gorman <mel@xxxxxxxxx>
---
include/linux/poison.h | 10 ++++++++++
mm/hugetlb.c | 6 +++++-
2 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/include/linux/poison.h b/include/linux/poison.h
index 2110a81..0f7b5ac 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -48,6 +48,16 @@
#define POISON_FREE 0x6b /* for use-after-free poisoning */
#define POISON_END 0xa5 /* end-byte of poisoning */

+/********** mm/hugetlb.c **********/
+/*
+ * Private mappings of hugetlb pages use this poisoned value for
+ * page->mapping. The core VM should not be doing anything with this mapping
+ * but futex requires the existance of some page->mapping value even if it
+ * is unused. If the core VM does deference the mapping, it'll look like a
+ * suspiciously high null-pointer offset starting from 0x2e5
+ */
+#define HUGETLB_PRIVATE_MAPPING (0x2e4 | PAGE_MAPPING_ANON)
+
/********** arch/$ARCH/mm/init.c **********/
#define POISON_FREE_INITMEM 0xcc

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6034dc9..487e3c2 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -546,6 +546,7 @@ static void free_huge_page(struct page *page)

mapping = (struct address_space *) page_private(page);
set_page_private(page, 0);
+ page->mapping = NULL;
BUG_ON(page_count(page));
INIT_LIST_HEAD(&page->lru);

@@ -2447,8 +2448,11 @@ retry:
spin_lock(&inode->i_lock);
inode->i_blocks += blocks_per_huge_page(h);
spin_unlock(&inode->i_lock);
- } else
+ } else {
lock_page(page);
+ page->mapping = (struct address_space *)
+ HUGETLB_PRIVATE_MAPPING;
+ }
}

/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/