Re: bug report: hugetlbfs: use i_mmap_rwsem for more pmd sharing, synchronization

From: Andrew Morton
Date: Thu Dec 27 2018 - 21:45:36 EST


On Thu, 27 Dec 2018 11:24:31 -0800 Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote:

> On 12/27/18 3:44 AM, Colin Ian King wrote:
> > Hi,
> >
> > Static analysis with CoverityScan on linux-next detected a potential
> > null pointer dereference with the following commit:
> >
> > From d8a1051ed4ba55679ef24e838a1942c9c40f0a14 Mon Sep 17 00:00:00 2001
> > From: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> > Date: Sat, 22 Dec 2018 10:55:57 +1100
> > Subject: [PATCH] hugetlbfs: use i_mmap_rwsem for more pmd sharing
> >
> > The earlier check implies that "mapping" may be a null pointer:
> >
> > var_compare_op: Comparing mapping to null implies that mapping might be
> > null.
> >
> > 1008 if (!(flags & MF_MUST_KILL) && !PageDirty(hpage) && mapping &&
> > 1009 mapping_cap_writeback_dirty(mapping)) {
> >
> > ..however later "mapper" is dereferenced when it may be potentially null:
> >
> > 1034 /*
> > 1035 * For hugetlb pages, try_to_unmap could potentially
> > call
> > 1036 * huge_pmd_unshare. Because of this, take semaphore in
> > 1037 * write mode here and set TTU_RMAP_LOCKED to
> > indicate we
> > 1038 * have taken the lock at this higer level.
> > 1039 */
> > CID 1476097 (#1 of 1): Dereference after null check (FORWARD_NULL)
> >
> > var_deref_model: Passing null pointer mapping to
> > i_mmap_lock_write, which dereferences it.
> >
> > 1040 i_mmap_lock_write(mapping);
> > 1041 unmap_success = try_to_unmap(hpage,
> > ttu|TTU_RMAP_LOCKED);
> > 1042 i_mmap_unlock_write(mapping);
> >
>
> Thanks for the report.
>
> The 'good news' is that mapping can not be null in the code path above.
> The reasons are:
> - The page is locked upon entry to the routine
> - Earlier in the routine there is the check:
> if (!page_mapped(hpage))
> return true;
> For huge pages (which are processed in the else clause above), page_mapped
> implies page->mapping != null.
>
> However, the routine hwpoison_user_mappings handles all page types. The
> page_mapped check is actually there to check for pages in the swap cache.
> It is just coincidence that it also implies mapping != null for huge pages.
>
> It would be better to make an explicit check for mapping != null before
> calling i_mmap_lock_write/try_to_unmap. In this way, unrelated changes to
> code above will not potentially lead to the possibility of mapping == null.
>
> I'm not sure what is the best way to handle this. Below is an updated version
> of the patch sent to Andrew. I can also provide a simple patch to the patch
> if that is easier.
>

Below is the delta. Please check it. It seems to do more than the
above implies.

Also, I have notes here that

hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization.patch
and
hugetlbfs-use-i_mmap_rwsem-to-fix-page-fault-truncate-race.patch

have additional updates pending. Due to emails such as

http://lkml.kernel.org/r/849f5202-2200-265f-7769-8363053e8373@xxxxxxxxxx
http://lkml.kernel.org/r/732c0b7d-5a4e-97a8-9677-30f3520893cb@xxxxxxxxxx
http://lkml.kernel.org/r/6b91dd42-b903-1f6c-729a-bd9f51273986@xxxxxxxxxx

What's the status, please?


From: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
Subject: hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization-fix

It would be better to make an explicit check for mapping != null before
calling i_mmap_lock_write/try_to_unmap. In this way, unrelated changes to
code above will not potentially lead to the possibility of mapping ==
null.

Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: "Kirill A . Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx>
Cc: Prakash Sangappa <prakash.sangappa@xxxxxxxxxx>
Cc: Colin Ian King <colin.king@xxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---


--- a/mm/hugetlb.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization-fix
+++ a/mm/hugetlb.c
@@ -3250,6 +3250,14 @@ int copy_hugetlb_page_range(struct mm_st
mmu_notifier_range_init(&range, src, vma->vm_start,
vma->vm_end, MMU_NOTIFY_CLEAR);
mmu_notifier_invalidate_range_start(&range);
+ } else {
+ /*
+ * For shared mappings i_mmap_rwsem must be held to call
+ * huge_pte_alloc, otherwise the returned ptep could go
+ * away if part of a shared pmd and another thread calls
+ * huge_pmd_unshare.
+ */
+ i_mmap_lock_read(mapping);
}

for (addr = vma->vm_start; addr < vma->vm_end; addr += sz) {
@@ -3259,18 +3267,8 @@ int copy_hugetlb_page_range(struct mm_st
if (!src_pte)
continue;

- /*
- * i_mmap_rwsem must be held to call huge_pte_alloc.
- * Continue to hold until finished with dst_pte, otherwise
- * it could go away if part of a shared pmd.
- *
- * Technically, i_mmap_rwsem is only needed in the non-cow
- * case as cow mappings are not shared.
- */
- i_mmap_lock_read(mapping);
dst_pte = huge_pte_alloc(dst, addr, sz);
if (!dst_pte) {
- i_mmap_unlock_read(mapping);
ret = -ENOMEM;
break;
}
@@ -3285,10 +3283,8 @@ int copy_hugetlb_page_range(struct mm_st
* after taking the lock below.
*/
dst_entry = huge_ptep_get(dst_pte);
- if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) {
- i_mmap_unlock_read(mapping);
+ if ((dst_pte == src_pte) || !huge_pte_none(dst_entry))
continue;
- }

dst_ptl = huge_pte_lock(h, dst, dst_pte);
src_ptl = huge_pte_lockptr(h, src, src_pte);
@@ -3337,12 +3333,12 @@ int copy_hugetlb_page_range(struct mm_st
}
spin_unlock(src_ptl);
spin_unlock(dst_ptl);
-
- i_mmap_unlock_read(mapping);
}

if (cow)
mmu_notifier_invalidate_range_end(&range);
+ else
+ i_mmap_unlock_read(mapping);

return ret;
}
--- a/mm/memory-failure.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization-fix
+++ a/mm/memory-failure.c
@@ -966,7 +966,7 @@ static bool hwpoison_user_mappings(struc
enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS;
struct address_space *mapping;
LIST_HEAD(tokill);
- bool unmap_success;
+ bool unmap_success = true;
int kill = 1, forcekill;
struct page *hpage = *hpagep;
bool mlocked = PageMlocked(hpage);
@@ -1030,7 +1030,7 @@ static bool hwpoison_user_mappings(struc

if (!PageHuge(hpage)) {
unmap_success = try_to_unmap(hpage, ttu);
- } else {
+ } else if (mapping) {
/*
* For hugetlb pages, try_to_unmap could potentially call
* huge_pmd_unshare. Because of this, take semaphore in
--- a/mm/rmap.c~hugetlbfs-use-i_mmap_rwsem-for-more-pmd-sharing-synchronization-fix
+++ a/mm/rmap.c
@@ -25,6 +25,7 @@
* page->flags PG_locked (lock_page)
* hugetlbfs_i_mmap_rwsem_key (in huge_pmd_share)
* mapping->i_mmap_rwsem
+ * hugetlb_fault_mutex (hugetlbfs specific page fault mutex)
* anon_vma->rwsem
* mm->page_table_lock or pte_lock
* zone_lru_lock (in mark_page_accessed, isolate_lru_page)
_