Re: [PATCH 3/3] mm: tail page refcounting optimization for slab andhugetlbfs

From: Andrew Morton
Date: Tue Nov 19 2013 - 18:14:32 EST


On Fri, 15 Nov 2013 18:47:48 +0100 Andrea Arcangeli <aarcange@xxxxxxxxxx> wrote:

> This skips the _mapcount mangling for slab and hugetlbfs pages.
>
> The main trouble in doing this is to guarantee that PageSlab and
> PageHeadHuge remains constant for all get_page/put_page run on the
> tail of slab or hugetlbfs compound pages. Otherwise if they're set
> during get_page but not set during put_page, the _mapcount of the tail
> page would underflow.
>
> PageHeadHuge will remain true until the compound page is released and
> enters the buddy allocator so it won't risk to change even if the tail
> page is the last reference left on the page.
>
> PG_slab instead is cleared before the slab frees the head page with
> put_page, so if the tail pin is released after the slab freed the
> page, we would have a problem. But in the slab case the tail pin
> cannot be the last reference left on the page. This is because the
> slab code is free to reuse the compound page after a
> kfree/kmem_cache_free without having to check if there's any tail pin
> left. In turn all tail pins must be always released while the head is
> still pinned by the slab code and so we know PG_slab will be still set
> too.
>
> ...
>
> + if (put_page_testzero(page_head)) {
> + /*
> + * If this is the tail
> + * of a a slab
> + * compound page, the
> + * tail pin must not
> + * be the last
> + * reference held on
> + * the page, because
> + * the PG_slab cannot
> + * be cleared before
> + * all tail pins
> + * (which skips the
> + * _mapcount tail
> + * refcounting) have
> + * been released. For
> + * hugetlbfs the tail
> + * pin may be the last
> + * reference on the
> + * page instead,
> + * because
> + * PageHeadHuge will
> + * not go away until
> + * the compound page
> + * enters the buddy
> + * allocator.
> + */
> + VM_BUG_ON(PageSlab(page_head));

This looks like it was attacked by Lindent. How's this look?


From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Subject: mm/swap.c: reorganize put_compound_page()

Tweak it so save a tab stop, make code layout slightly less nutty.

Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Khalid Aziz <khalid.aziz@xxxxxxxxxx>
Cc: Pravin Shelar <pshelar@xxxxxxxxxx>
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: Ben Hutchings <bhutchings@xxxxxxxxxxxxxx>
Cc: Christoph Lameter <cl@xxxxxxxxx>
Cc: Johannes Weiner <jweiner@xxxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxx>
Cc: Andi Kleen <andi@xxxxxxxxxxxxxx>
Cc: Minchan Kim <minchan@xxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

mm/swap.c | 238 +++++++++++++++++++++++-----------------------------
1 file changed, 107 insertions(+), 131 deletions(-)

diff -puN mm/swap.c~mm-swapc-reorganize-put_compound_page mm/swap.c
--- a/mm/swap.c~mm-swapc-reorganize-put_compound_page
+++ a/mm/swap.c
@@ -82,150 +82,126 @@ static void __put_compound_page(struct p

static void put_compound_page(struct page *page)
{
- if (unlikely(PageTail(page))) {
- /* __split_huge_page_refcount can run under us */
- struct page *page_head = compound_trans_head(page);
-
- if (likely(page != page_head &&
- get_page_unless_zero(page_head))) {
- unsigned long flags;
-
- /*
- * THP can not break up slab pages or
- * hugetlbfs pages so avoid taking
- * compound_lock() and skip the tail page
- * refcounting (in _mapcount) too. Slab
- * performs non-atomic bit ops on page->flags
- * for better performance. In particular
- * slab_unlock() in slub used to be a hot
- * path. It is still hot on arches that do not
- * support this_cpu_cmpxchg_double().
- */
- if (PageSlab(page_head) || PageHeadHuge(page_head)) {
- if (likely(PageTail(page))) {
- /*
- * __split_huge_page_refcount
- * cannot race here.
- */
- VM_BUG_ON(!PageHead(page_head));
- VM_BUG_ON(atomic_read(&page->_mapcount)
- != -1);
- if (put_page_testzero(page_head))
- VM_BUG_ON(1);
- if (put_page_testzero(page_head)) {
- /*
- * If this is the tail
- * of a a slab
- * compound page, the
- * tail pin must not
- * be the last
- * reference held on
- * the page, because
- * the PG_slab cannot
- * be cleared before
- * all tail pins
- * (which skips the
- * _mapcount tail
- * refcounting) have
- * been released. For
- * hugetlbfs the tail
- * pin may be the last
- * reference on the
- * page instead,
- * because
- * PageHeadHuge will
- * not go away until
- * the compound page
- * enters the buddy
- * allocator.
- */
- VM_BUG_ON(PageSlab(page_head));
- __put_compound_page(page_head);
- }
- return;
- } else
- /*
- * __split_huge_page_refcount
- * run before us, "page" was a
- * THP tail. The split
- * page_head has been freed
- * and reallocated as slab or
- * hugetlbfs page of smaller
- * order (only possible if
- * reallocated as slab on
- * x86).
- */
- goto skip_lock;
- }
- /*
- * page_head wasn't a dangling pointer but it
- * may not be a head page anymore by the time
- * we obtain the lock. That is ok as long as it
- * can't be freed from under us.
- */
- flags = compound_lock_irqsave(page_head);
- if (unlikely(!PageTail(page))) {
- /* __split_huge_page_refcount run before us */
- compound_unlock_irqrestore(page_head, flags);
-skip_lock:
+ struct page *page_head;
+
+ if (likely(PageTail(page))) {
+ if (put_page_testzero(page)) {
+ if (PageHead(page))
+ __put_compound_page(page);
+ else
+ __put_single_page(page);
+ }
+ return;
+ }
+
+ /* __split_huge_page_refcount can run under us */
+ page_head = compound_trans_head(page);
+
+ if (likely(page != page_head && get_page_unless_zero(page_head))) {
+ unsigned long flags;
+
+ /*
+ * THP can not break up slab pages or hugetlbfs pages so avoid
+ * taking compound_lock() and skip the tail page refcounting
+ * (in _mapcount) too. Slab performs non-atomic bit ops on
+ * page->flags for better performance. In particular
+ * slab_unlock() in slub used to be a hot path. It is still hot
+ * on arches that do not support this_cpu_cmpxchg_double().
+ */
+ if (PageSlab(page_head) || PageHeadHuge(page_head)) {
+ if (likely(PageTail(page))) {
+ /*
+ * __split_huge_page_refcount cannot race here.
+ */
+ VM_BUG_ON(!PageHead(page_head));
+ VM_BUG_ON(atomic_read(&page->_mapcount) != -1);
+ if (put_page_testzero(page_head))
+ VM_BUG_ON(1);
if (put_page_testzero(page_head)) {
/*
- * The head page may have been
- * freed and reallocated as a
- * compound page of smaller
- * order and then freed again.
- * All we know is that it
- * cannot have become: a THP
- * page, a compound page of
- * higher order, a tail page.
- * That is because we still
- * hold the refcount of the
- * split THP tail and
- * page_head was the THP head
- * before the split.
+ * If this is the tail of a slab
+ * compound page, the tail pin must not
+ * be the last reference held on the
+ * page, because the PG_slab cannot be
+ * cleared before all tail pins (which
+ * skips the _mapcount tail refcounting)
+ * have been released. For hugetlbfs the
+ * tail pin may be the last reference on
+ * the page instead, because
+ * PageHeadHuge will not go away until
+ * the compound page enters the buddy
+ * allocator.
*/
- if (PageHead(page_head))
- __put_compound_page(page_head);
- else
- __put_single_page(page_head);
+ VM_BUG_ON(PageSlab(page_head));
+ __put_compound_page(page_head);
}
-out_put_single:
- if (put_page_testzero(page))
- __put_single_page(page);
return;
- }
- VM_BUG_ON(page_head != page->first_page);
- /*
- * We can release the refcount taken by
- * get_page_unless_zero() now that
- * __split_huge_page_refcount() is blocked on
- * the compound_lock.
- */
- if (put_page_testzero(page_head))
- VM_BUG_ON(1);
- /* __split_huge_page_refcount will wait now */
- VM_BUG_ON(page_mapcount(page) <= 0);
- atomic_dec(&page->_mapcount);
- VM_BUG_ON(atomic_read(&page_head->_count) <= 0);
- VM_BUG_ON(atomic_read(&page->_count) != 0);
+ } else
+ /*
+ * __split_huge_page_refcount run before us,
+ * "page" was a THP tail. The split page_head
+ * has been freed and reallocated as slab or
+ * hugetlbfs page of smaller order (only
+ * possible if reallocated as slab on x86).
+ */
+ goto skip_lock;
+ }
+ /*
+ * page_head wasn't a dangling pointer but it may not be a head
+ * page anymore by the time we obtain the lock. That is ok as
+ * long as it can't be freed from under us.
+ */
+ flags = compound_lock_irqsave(page_head);
+ if (unlikely(!PageTail(page))) {
+ /* __split_huge_page_refcount run before us */
compound_unlock_irqrestore(page_head, flags);
-
+skip_lock:
if (put_page_testzero(page_head)) {
+ /*
+ * The head page may have been freed and
+ * reallocated as a compound page of smaller
+ * order and then freed again. All we know is
+ * that it cannot have become: a THP page, a
+ * compound page of higher order, a tail page.
+ * That is because we still hold the refcount of
+ * the split THP tail and page_head was the THP
+ * head before the split.
+ */
if (PageHead(page_head))
__put_compound_page(page_head);
else
__put_single_page(page_head);
}
- } else {
- /* page_head is a dangling pointer */
- VM_BUG_ON(PageTail(page));
- goto out_put_single;
+out_put_single:
+ if (put_page_testzero(page))
+ __put_single_page(page);
+ return;
+ }
+ VM_BUG_ON(page_head != page->first_page);
+ /*
+ * We can release the refcount taken by get_page_unless_zero()
+ * now that __split_huge_page_refcount() is blocked on the
+ * compound_lock.
+ */
+ if (put_page_testzero(page_head))
+ VM_BUG_ON(1);
+ /* __split_huge_page_refcount will wait now */
+ VM_BUG_ON(page_mapcount(page) <= 0);
+ atomic_dec(&page->_mapcount);
+ VM_BUG_ON(atomic_read(&page_head->_count) <= 0);
+ VM_BUG_ON(atomic_read(&page->_count) != 0);
+ compound_unlock_irqrestore(page_head, flags);
+
+ if (put_page_testzero(page_head)) {
+ if (PageHead(page_head))
+ __put_compound_page(page_head);
+ else
+ __put_single_page(page_head);
}
- } else if (put_page_testzero(page)) {
- if (PageHead(page))
- __put_compound_page(page);
- else
- __put_single_page(page);
+ } else {
+ /* page_head is a dangling pointer */
+ VM_BUG_ON(PageTail(page));
+ goto out_put_single;
}
}

_

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/