Re: [PATCH] mm: compaction: Abort compaction if too many pages areisolated and caller is asynchronous

From: Mel Gorman
Date: Mon Jun 06 2011 - 06:32:26 EST


On Fri, Jun 03, 2011 at 08:07:30PM +0200, Andrea Arcangeli wrote:
> On Fri, Jun 03, 2011 at 07:37:07PM +0200, Andrea Arcangeli wrote:
> > On Fri, Jun 03, 2011 at 08:01:44AM +0900, Minchan Kim wrote:
> > > Do you want this? (it's almost pseudo-code)
> >
> > Yes that's good idea so we at least take into account if we isolated
> > something big, and it's pointless to insist wasting CPU on the tail
> > pages and even trace a fail because of tail pages after it.
> >
> > I introduced a __page_count to increase readability. It's still
> > hackish to work on subpages in vmscan.c but at least I added a comment
> > and until we serialize destroy_compound_page vs compound_head, I guess
> > there's no better way. I didn't attempt to add out of order
> > serialization similar to what exists for split_huge_page vs
> > compound_trans_head yet, as the page can be allocated or go away from
> > under us, in split_huge_page vs compound_trans_head it's simpler
> > because both callers are required to hold a pin on the page so the
> > page can't go be reallocated and destroyed under it.
>
> Sent too fast... had to shuffle a few things around... trying again.
>
> ===
> Subject: mm: no page_count without a page pin
>
> From: Andrea Arcangeli <aarcange@xxxxxxxxxx>
>
> It's unsafe to run page_count during the physical pfn scan because
> compound_head could trip on a dangling pointer when reading page->first_page if
> the compound page is being freed by another CPU. Also properly take into
> account if we isolated a compound page during the scan and break the loop if
> we've isolated enoguh. Introduce __page_count to cleanup some atomic_read from
> &page->_count in common code to cleanup.
>
> Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>

This patch is pulling in stuff from Minchan. Minimally his patch should
be kept separate to preserve history or his Signed-off should be
included on this patch.

> ---
> arch/powerpc/mm/gup.c | 2 -
> arch/powerpc/platforms/512x/mpc512x_shared.c | 2 -
> arch/x86/mm/gup.c | 2 -
> fs/nilfs2/page.c | 2 -
> include/linux/mm.h | 13 ++++++----
> mm/huge_memory.c | 4 +--
> mm/internal.h | 2 -
> mm/page_alloc.c | 6 ++--
> mm/swap.c | 4 +--
> mm/vmscan.c | 35 ++++++++++++++++++++-------
> 10 files changed, 47 insertions(+), 25 deletions(-)
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1047,7 +1047,7 @@ static unsigned long isolate_lru_pages(u
> for (scan = 0; scan < nr_to_scan && !list_empty(src); scan++) {
> struct page *page;
> unsigned long pfn;
> - unsigned long end_pfn;
> + unsigned long start_pfn, end_pfn;
> unsigned long page_pfn;
> int zone_id;
>
> @@ -1087,9 +1087,9 @@ static unsigned long isolate_lru_pages(u
> */
> zone_id = page_zone_id(page);
> page_pfn = page_to_pfn(page);
> - pfn = page_pfn & ~((1 << order) - 1);
> - end_pfn = pfn + (1 << order);
> - for (; pfn < end_pfn; pfn++) {
> + start_pfn = page_pfn & ~((1 << order) - 1);
> + end_pfn = start_pfn + (1 << order);
> + for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> struct page *cursor_page;
>
> /* The target page is in the block, ignore it. */
> @@ -1116,16 +1116,33 @@ static unsigned long isolate_lru_pages(u
> break;
>
> if (__isolate_lru_page(cursor_page, mode, file) == 0) {
> + unsigned int isolated_pages;
> list_move(&cursor_page->lru, dst);
> mem_cgroup_del_lru(cursor_page);
> - nr_taken += hpage_nr_pages(page);
> - nr_lumpy_taken++;
> + isolated_pages = hpage_nr_pages(page);
> + nr_taken += isolated_pages;
> + nr_lumpy_taken += isolated_pages;
> if (PageDirty(cursor_page))
> - nr_lumpy_dirty++;
> + nr_lumpy_dirty += isolated_pages;
> scan++;
> + pfn += isolated_pages-1;

Ah, here is the isolated_pages - 1 which is necessary. Should have read
the whole thread before responding to anything :).

I still think this optimisation is rare and only applies if we are
encountering huge pages during the linear scan. How often are we doing
that really?

> + VM_BUG_ON(!isolated_pages);

This BUG_ON is overkill. hpage_nr_pages would have to return 0.

> + VM_BUG_ON(isolated_pages > MAX_ORDER_NR_PAGES);

This would require order > MAX_ORDER_NR_PAGES to be passed into
isolate_lru_pages or for a huge page to be unaligned to a power of
two. The former is very unlikely and the latter is not supported by
any CPU.

> } else {
> - /* the page is freed already. */
> - if (!page_count(cursor_page))
> + /*
> + * Check if the page is freed already.
> + *
> + * We can't use page_count() as that
> + * requires compound_head and we don't
> + * have a pin on the page here. If a
> + * page is tail, we may or may not
> + * have isolated the head, so assume
> + * it's not free, it'd be tricky to
> + * track the head status without a
> + * page pin.
> + */
> + if (!PageTail(cursor_page) &&
> + !__page_count(cursor_page))
> continue;
> break;

Ack to this part.

I'm not keen on __page_count() as __ normally means the "unlocked"
version of a function although I realise that rule isn't universal
either. I can't think of a better name though.

> }
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -266,12 +266,17 @@ struct inode;
> * routine so they can be sure the page doesn't go away from under them.
> */
>
> +static inline int __page_count(struct page *page)
> +{
> + return atomic_read(&page->_count);
> +}
> +
> /*
> * Drop a ref, return true if the refcount fell to zero (the page has no users)
> */
> static inline int put_page_testzero(struct page *page)
> {
> - VM_BUG_ON(atomic_read(&page->_count) == 0);
> + VM_BUG_ON(__page_count(page) == 0);
> return atomic_dec_and_test(&page->_count);
> }
>
> @@ -357,7 +362,7 @@ static inline struct page *compound_head
>
> static inline int page_count(struct page *page)
> {
> - return atomic_read(&compound_head(page)->_count);
> + return __page_count(compound_head(page));
> }
>
> static inline void get_page(struct page *page)
> @@ -370,7 +375,7 @@ static inline void get_page(struct page
> * bugcheck only verifies that the page->_count isn't
> * negative.
> */
> - VM_BUG_ON(atomic_read(&page->_count) < !PageTail(page));
> + VM_BUG_ON(__page_count(page) < !PageTail(page));
> atomic_inc(&page->_count);
> /*
> * Getting a tail page will elevate both the head and tail
> @@ -382,7 +387,7 @@ static inline void get_page(struct page
> * __split_huge_page_refcount can't run under
> * get_page().
> */
> - VM_BUG_ON(atomic_read(&page->first_page->_count) <= 0);
> + VM_BUG_ON(__page_count(page->first_page) <= 0);
> atomic_inc(&page->first_page->_count);
> }
> }
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1203,10 +1203,10 @@ static void __split_huge_page_refcount(s
> struct page *page_tail = page + i;
>
> /* tail_page->_count cannot change */
> - atomic_sub(atomic_read(&page_tail->_count), &page->_count);
> + atomic_sub(__page_count(page_tail), &page->_count);
> BUG_ON(page_count(page) <= 0);
> atomic_add(page_mapcount(page) + 1, &page_tail->_count);
> - BUG_ON(atomic_read(&page_tail->_count) <= 0);
> + BUG_ON(__page_count(page_tail) <= 0);
>
> /* after clearing PageTail the gup refcount can be released */
> smp_mb();
> --- a/arch/powerpc/mm/gup.c
> +++ b/arch/powerpc/mm/gup.c
> @@ -22,7 +22,7 @@ static inline void get_huge_page_tail(st
> * __split_huge_page_refcount() cannot run
> * from under us.
> */
> - VM_BUG_ON(atomic_read(&page->_count) < 0);
> + VM_BUG_ON(__page_count(page) < 0);
> atomic_inc(&page->_count);
> }
>
> --- a/arch/powerpc/platforms/512x/mpc512x_shared.c
> +++ b/arch/powerpc/platforms/512x/mpc512x_shared.c
> @@ -200,7 +200,7 @@ static inline void mpc512x_free_bootmem(
> {
> __ClearPageReserved(page);
> BUG_ON(PageTail(page));
> - BUG_ON(atomic_read(&page->_count) > 1);
> + BUG_ON(__page_count(page) > 1);
> atomic_set(&page->_count, 1);
> __free_page(page);
> totalram_pages++;
> --- a/arch/x86/mm/gup.c
> +++ b/arch/x86/mm/gup.c
> @@ -114,7 +114,7 @@ static inline void get_huge_page_tail(st
> * __split_huge_page_refcount() cannot run
> * from under us.
> */
> - VM_BUG_ON(atomic_read(&page->_count) < 0);
> + VM_BUG_ON(__page_count(page) < 0);
> atomic_inc(&page->_count);
> }
>
> --- a/fs/nilfs2/page.c
> +++ b/fs/nilfs2/page.c
> @@ -181,7 +181,7 @@ void nilfs_page_bug(struct page *page)
>
> printk(KERN_CRIT "NILFS_PAGE_BUG(%p): cnt=%d index#=%llu flags=0x%lx "
> "mapping=%p ino=%lu\n",
> - page, atomic_read(&page->_count),
> + page, __page_count(page),
> (unsigned long long)page->index, page->flags, m, ino);
>
> if (page_has_buffers(page)) {
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -28,7 +28,7 @@ static inline void set_page_count(struct
> static inline void set_page_refcounted(struct page *page)
> {
> VM_BUG_ON(PageTail(page));
> - VM_BUG_ON(atomic_read(&page->_count));
> + VM_BUG_ON(__page_count(page));
> set_page_count(page, 1);
> }
>
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -568,7 +568,7 @@ static inline int free_pages_check(struc
> {
> if (unlikely(page_mapcount(page) |
> (page->mapping != NULL) |
> - (atomic_read(&page->_count) != 0) |
> + (__page_count(page) != 0) |
> (page->flags & PAGE_FLAGS_CHECK_AT_FREE) |
> (mem_cgroup_bad_page_check(page)))) {
> bad_page(page);
> @@ -758,7 +758,7 @@ static inline int check_new_page(struct
> {
> if (unlikely(page_mapcount(page) |
> (page->mapping != NULL) |
> - (atomic_read(&page->_count) != 0) |
> + (__page_count(page) != 0) |
> (page->flags & PAGE_FLAGS_CHECK_AT_PREP) |
> (mem_cgroup_bad_page_check(page)))) {
> bad_page(page);
> @@ -5739,7 +5739,7 @@ void dump_page(struct page *page)
> {
> printk(KERN_ALERT
> "page:%p count:%d mapcount:%d mapping:%p index:%#lx\n",
> - page, atomic_read(&page->_count), page_mapcount(page),
> + page, __page_count(page), page_mapcount(page),
> page->mapping, page->index);
> dump_page_flags(page->flags);
> mem_cgroup_print_bad_page(page);
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -128,9 +128,9 @@ static void put_compound_page(struct pag
> if (put_page_testzero(page_head))
> VM_BUG_ON(1);
> /* __split_huge_page_refcount will wait now */
> - VM_BUG_ON(atomic_read(&page->_count) <= 0);
> + VM_BUG_ON(__page_count(page) <= 0);
> atomic_dec(&page->_count);
> - VM_BUG_ON(atomic_read(&page_head->_count) <= 0);
> + VM_BUG_ON(__page_count(page_head) <= 0);
> compound_unlock_irqrestore(page_head, flags);
> if (put_page_testzero(page_head)) {
> if (PageHead(page_head))

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/