[RFC PATCH 3/3] huegtlbfs: handle page fault/truncate races

From: Mike Kravetz
Date: Tue Oct 13 2020 - 19:11:38 EST

Next message: Stephen Boyd: "Re: [PATCH v2 1/2] dt-bindings: clock: mediatek: add bindings for MT8167 clocks"
Previous message: Mike Kravetz: "[RFC PATCH 1/3] hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync"
In reply to: Mike Kravetz: "[RFC PATCH 1/3] hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync"
Next in thread: Mike Kravetz: "[RFC PATCH 2/3] hugetlbfs: introduce hinode_rwsem for pmd sharing synchronization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

A huegtlb page fault can race with page truncation. Make the code
identifying and handling these races more robust.

Page fault handling needs to back out pages added to page cache beyond
file size (i_size). When backing out the page, take care to restore
reserve map entries and counts as necessary.

File truncation (remove_inode_hugepages) needs to handle page mapping
changes that could have happened before locking the page. This could
happen if page was added to page cache and later backed out in fault
processing.

Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
---
fs/hugetlbfs/inode.c | 34 ++++++++++++++++++++--------------
mm/hugetlb.c | 40 ++++++++++++++++++++++++++++++++++++++--
2 files changed, 58 insertions(+), 16 deletions(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index d6bb675d4872..f6ca2892e833 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -514,23 +514,29 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,

lock_page(page);
/*
- * We must free the huge page and remove from page
- * cache (remove_huge_page) BEFORE removing the
- * region/reserve map (hugetlb_unreserve_pages). In
- * rare out of memory conditions, removal of the
- * region/reserve map could fail. Correspondingly,
- * the subpool and global reserve usage count can need
- * to be adjusted.
+ * After locking page, make sure mapping is the same.
+ * We could have raced with page fault populate and
+ * backout code.
*/
- VM_BUG_ON(PagePrivate(page));
- remove_huge_page(page);
- freed++;
- if (!truncate_op) {
- if (unlikely(hugetlb_unreserve_pages(inode,
+ if (page_mapping(page) == mapping) {
+ /*
+ * We must free the huge page and remove from
+ * page cache (remove_huge_page) BEFORE
+ * removing the region/reserve map. In rare
+ * out of memory conditions, removal of the
+ * region/reserve map could fail and the
+ * subpool and global reserve usage count
+ * will need to be adjusted.
+ */
+ VM_BUG_ON(PagePrivate(page));
+ remove_huge_page(page);
+ freed++;
+ if (!truncate_op) {
+ if (unlikely(hugetlb_unreserve_pages(inode,
index, index + 1, 1)))
- hugetlb_fix_reserve_counts(inode);
+ hugetlb_fix_reserve_counts(inode);
+ }
}
-
unlock_page(page);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 940c037418f8..e0ba58385036 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4225,6 +4225,9 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
spinlock_t *ptl;
unsigned long haddr = address & huge_page_mask(h);
bool new_page = false;
+ bool page_cache = false;
+ bool reserve_alloc = false;
+ bool beyond_i_size = false;

/*
* Currently, we are forced to kill the process in the event the
@@ -4309,6 +4312,8 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
clear_huge_page(page, address, pages_per_huge_page(h));
__SetPageUptodate(page);
new_page = true;
+ if (PagePrivate(page))
+ reserve_alloc = true;

if (vma->vm_flags & VM_MAYSHARE) {
int err = huge_add_to_page_cache(page, mapping, idx);
@@ -4318,6 +4323,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
goto retry;
goto out;
}
+ page_cache = true;
} else {
lock_page(page);
if (unlikely(anon_vma_prepare(vma))) {
@@ -4356,8 +4362,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,

ptl = huge_pte_lock(h, mm, ptep);
size = i_size_read(mapping->host) >> huge_page_shift(h);
- if (idx >= size)
+ if (idx >= size) {
+ beyond_i_size = true;
goto backout;
+ }

ret = 0;
if (!huge_pte_none(huge_ptep_get(ptep)))
@@ -4395,8 +4403,36 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
backout:
spin_unlock(ptl);
backout_unlocked:
+ if (new_page) {
+ if (page_cache && beyond_i_size) {
+ /*
+ * Back out pages added to page cache beyond i_size.
+ * Otherwise, they will 'sit' there until the file
+ * is removed.
+ */
+ ClearPageDirty(page);
+ ClearPageUptodate(page);
+ delete_from_page_cache(page);
+ }
+
+ if (reserve_alloc) {
+ /*
+ * If reserve was consumed, set PagePrivate so that
+ * it will be restored in free_huge_page().
+ */
+ SetPagePrivate(page);
+ }
+
+ if (!beyond_i_size) {
+ /*
+ * Do not restore reserve map entries beyond i_size.
+ * there will be leaks when the file is removed.
+ */
+ restore_reserve_on_error(h, vma, haddr, page);
+ }
+
+ }
unlock_page(page);
- restore_reserve_on_error(h, vma, haddr, page);
put_page(page);
goto out;
}
--
2.25.4

Next message: Stephen Boyd: "Re: [PATCH v2 1/2] dt-bindings: clock: mediatek: add bindings for MT8167 clocks"
Previous message: Mike Kravetz: "[RFC PATCH 1/3] hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync"
In reply to: Mike Kravetz: "[RFC PATCH 1/3] hugetlbfs: revert use of i_mmap_rwsem for pmd sharing and more sync"
Next in thread: Mike Kravetz: "[RFC PATCH 2/3] hugetlbfs: introduce hinode_rwsem for pmd sharing synchronization"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]