[PATCH 14/24] huge tmpfs: extend vma_adjust_trans_huge to shmem pmd

From: Hugh Dickins
Date: Fri Feb 20 2015 - 23:13:50 EST

Factor out one small part of the shmem pmd handling: the inline function
vma_adjust_trans_huge() (called when vmas are split or merged) contains
a preliminary !anon_vma || vm_ops check to avoid the overhead of
__vma_adjust_trans_huge() on areas which could not possibly contain an
anonymous THP pmd. But with huge tmpfs, we shall need it to be called
even in those excluded cases.

Before the split pmd ptlocks, there was a nice alternative optimization
to make: avoid the overhead of __vma_adjust_trans_huge() on mms which
could not possibly contain a huge pmd - those with NULL pmd_huge_pte
(using a huge pmd demands the deposit of a spare page table, typically
stored in a list at pmd_huge_pte, withdrawn for use when splitting the
pmd; and huge tmpfs will follow that protocol too).

Still use that optimization when !USE_SPLIT_PMD_PTLOCKS, when
mm->pmd_huge_pte is updated under mm->page_table_lock (but beware:
unlike other arches, powerpc made no use of pmd_huge_pte before, so
this patch hacks it to update pmd_huge_pte as a count). In common
configs, no equivalent optimization on x86 now: if that's a visible
problem, we can add an atomic count or flag to mm for the purpose.

And looking into the overhead of __vma_adjust_trans_huge(): it is
silly for split_huge_page_pmd_mm() to be calling find_vma() followed
by split_huge_page_pmd(), when it can check the pmd directly first,
and usually avoid the find_vma() call.

Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
arch/powerpc/mm/pgtable_64.c | 7 ++++++-
include/linux/huge_mm.h | 5 ++++-
mm/huge_memory.c | 7 ++-----
3 files changed, 12 insertions(+), 7 deletions(-)

--- thpfs.orig/arch/powerpc/mm/pgtable_64.c 2015-02-08 18:54:22.000000000 -0800
+++ thpfs/arch/powerpc/mm/pgtable_64.c 2015-02-20 19:34:32.363944978 -0800
@@ -675,9 +675,12 @@ void pgtable_trans_huge_deposit(struct m
pgtable_t pgtable)
pgtable_t *pgtable_slot;
+ mm->pmd_huge_pte++;
- * we store the pgtable in the second half of PMD
+ * we store the pgtable in the second half of PMD; but must also
+ * set pmd_huge_pte for the optimization in vma_adjust_trans_huge().
pgtable_slot = (pgtable_t *)pmdp + PTRS_PER_PMD;
*pgtable_slot = pgtable;
@@ -696,6 +699,8 @@ pgtable_t pgtable_trans_huge_withdraw(st
pgtable_t *pgtable_slot;

+ mm->pmd_huge_pte--;
pgtable_slot = (pgtable_t *)pmdp + PTRS_PER_PMD;
pgtable = *pgtable_slot;
--- thpfs.orig/include/linux/huge_mm.h 2014-12-07 14:21:05.000000000 -0800
+++ thpfs/include/linux/huge_mm.h 2015-02-20 19:34:32.363944978 -0800
@@ -143,8 +143,11 @@ static inline void vma_adjust_trans_huge
unsigned long end,
long adjust_next)
- if (!vma->anon_vma || vma->vm_ops)
+ /* If no pgtable is deposited, there is no huge pmd to worry about */
+ if (!vma->vm_mm->pmd_huge_pte)
__vma_adjust_trans_huge(vma, start, end, adjust_next);
static inline int hpage_nr_pages(struct page *page)
--- thpfs.orig/mm/huge_memory.c 2015-02-20 19:33:51.492038431 -0800
+++ thpfs/mm/huge_memory.c 2015-02-20 19:34:32.367944969 -0800
@@ -2905,11 +2905,8 @@ again:
void split_huge_page_pmd_mm(struct mm_struct *mm, unsigned long address,
pmd_t *pmd)
- struct vm_area_struct *vma;
- vma = find_vma(mm, address);
- BUG_ON(vma == NULL);
- split_huge_page_pmd(vma, address, pmd);
+ if (unlikely(pmd_trans_huge(*pmd)))
+ __split_huge_page_pmd(find_vma(mm, address), address, pmd);

static void split_huge_page_address(struct mm_struct *mm,
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/