Re: [PATCH] mm: numa: Recheck for transhuge pages under lock during protection changes

From: Mel Gorman
Date: Wed Mar 19 2014 - 10:38:46 EST


On Fri, Mar 14, 2014 at 11:15:37PM -0400, Sasha Levin wrote:
> On 03/12/2014 06:36 AM, Mel Gorman wrote:
> >Andrew, this should go with the patches
> >mmnuma-reorganize-change_pmd_range.patch
> >mmnuma-reorganize-change_pmd_range-fix.patch
> >move-mmu-notifier-call-from-change_protection-to-change_pmd_range.patch
> >in mmotm please.
> >
> >Thanks.
> >
> >---8<---
> >From: Mel Gorman<mgorman@xxxxxxx>
> >Subject: [PATCH] mm: numa: Recheck for transhuge pages under lock during protection changes
> >
> >Sasha Levin reported the following bug using trinity
>
> I'm seeing a different issue with this patch. A NULL ptr deref occurs in the
> pte_offset_map_lock() macro right before the new recheck code:
>

This on top?

I tried testing it but got all sorts of carnage that trinity throw up
in the mix and ordinary testing does not trigger the race. I've no idea
which of the current mess of trinity-exposed bugs you've encountered and
got fixed already.

---8<---
From: Mel Gorman <mgorman@xxxxxxx>
Subject: [PATCH] mm: numa: Recheck for transhuge pages under lock during protection changes -fix

Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
---
mm/mprotect.c | 40 ++++++++++++++++++++++++++++++----------
1 file changed, 30 insertions(+), 10 deletions(-)

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 66973db..c43d557 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -36,6 +36,34 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
}
#endif

+/*
+ * For a prot_numa update we only hold mmap_sem for read so there is a
+ * potential race with faulting where a pmd was temporarily none. This
+ * function checks for a transhuge pmd under the appropriate lock. It
+ * returns a pte if it was successfully locked or NULL if it raced with
+ * a transhuge insertion.
+ */
+static pte_t *lock_pte_protection(struct vm_area_struct *vma, pmd_t *pmd,
+ unsigned long addr, int prot_numa, spinlock_t **ptl)
+{
+ pte_t *pte;
+ spinlock_t *pmdl;
+
+ /* !prot_numa is protected by mmap_sem held for write */
+ if (!prot_numa)
+ return pte_offset_map_lock(vma->vm_mm, pmd, addr, ptl);
+
+ pmdl = pmd_lock(vma->vm_mm, pmd);
+ if (unlikely(pmd_trans_huge(*pmd) || pmd_none(*pmd))) {
+ spin_unlock(pmdl);
+ return NULL;
+ }
+
+ pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, ptl);
+ spin_unlock(pmdl);
+ return pte;
+}
+
static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
unsigned long addr, unsigned long end, pgprot_t newprot,
int dirty_accountable, int prot_numa)
@@ -45,17 +73,9 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
spinlock_t *ptl;
unsigned long pages = 0;

- pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
-
- /*
- * For a prot_numa update we only hold mmap_sem for read so there is a
- * potential race with faulting where a pmd was temporarily none so
- * recheck it under the lock and bail if we race
- */
- if (prot_numa && unlikely(pmd_trans_huge(*pmd))) {
- pte_unmap_unlock(pte, ptl);
+ pte = lock_pte_protection(vma, pmd, addr, prot_numa, &ptl);
+ if (!pte)
return 0;
- }

arch_enter_lazy_mmu_mode();
do {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/