[PATCH 5.10 14/63] mm/hugetlb: fix deadlock in hugetlb_cow error path

From: Greg Kroah-Hartman
Date: Mon Jan 04 2021 - 11:11:33 EST

Next message: Greg Kroah-Hartman: "[PATCH 5.10 16/63] lib/zlib: fix inflating zlib streams on s390"
Previous message: Greg Kroah-Hartman: "[PATCH 5.10 01/63] net/sched: sch_taprio: reset child qdiscs before freeing them"
In reply to: Jakub Kicinski: "Re: [PATCH 5.10 01/63] net/sched: sch_taprio: reset child qdiscs before freeing them"
Next in thread: Greg Kroah-Hartman: "[PATCH 5.10 16/63] lib/zlib: fix inflating zlib streams on s390"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Mike Kravetz <mike.kravetz@xxxxxxxxxx>

commit e7dd91c456a8cdbcd7066997d15e36d14276a949 upstream.

syzbot reported the deadlock here [1]. The issue is in hugetlb cow
error handling when there are not enough huge pages for the faulting
task which took the original reservation. It is possible that other
(child) tasks could have consumed pages associated with the reservation.
In this case, we want the task which took the original reservation to
succeed. So, we unmap any associated pages in children so that they can
be used by the faulting task that owns the reservation.

The unmapping code needs to hold i_mmap_rwsem in write mode. However,
due to commit c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd
sharing synchronization") we are already holding i_mmap_rwsem in read
mode when hugetlb_cow is called.

Technically, i_mmap_rwsem does not need to be held in read mode for COW
mappings as they can not share pmd's. Modifying the fault code to not
take i_mmap_rwsem in read mode for COW (and other non-sharable) mappings
is too involved for a stable fix.

Instead, we simply drop the hugetlb_fault_mutex and i_mmap_rwsem before
unmapping. This is OK as it is technically not needed. They are
reacquired after unmapping as expected by calling code. Since this is
done in an uncommon error path, the overhead of dropping and reacquiring
mutexes is acceptable.

While making changes, remove redundant BUG_ON after unmap_ref_private.

[1] https://lkml.kernel.org/r/000000000000b73ccc05b5cf8558@xxxxxxxxxx

Link: https://lkml.kernel.org/r/4c5781b8-3b00-761e-c0c7-c5edebb6ec1a@xxxxxxxxxx
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
Reported-by: syzbot+5eee4145df3c15e96625@xxxxxxxxxxxxxxxxxxxxxxxxx
Cc: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Cc: Michal Hocko <mhocko@xxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: "Aneesh Kumar K . V" <aneesh.kumar@xxxxxxxxxxxxxxxxxx>
Cc: Davidlohr Bueso <dave@xxxxxxxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

---
mm/hugetlb.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)

--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4106,10 +4106,30 @@ retry_avoidcopy:
* may get SIGKILLed if it later faults.
*/
if (outside_reserve) {
+ struct address_space *mapping = vma->vm_file->f_mapping;
+ pgoff_t idx;
+ u32 hash;
+
put_page(old_page);
BUG_ON(huge_pte_none(pte));
+ /*
+ * Drop hugetlb_fault_mutex and i_mmap_rwsem before
+ * unmapping. unmapping needs to hold i_mmap_rwsem
+ * in write mode. Dropping i_mmap_rwsem in read mode
+ * here is OK as COW mappings do not interact with
+ * PMD sharing.
+ *
+ * Reacquire both after unmap operation.
+ */
+ idx = vma_hugecache_offset(h, vma, haddr);
+ hash = hugetlb_fault_mutex_hash(mapping, idx);
+ mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ i_mmap_unlock_read(mapping);
+
unmap_ref_private(mm, vma, old_page, haddr);
- BUG_ON(huge_pte_none(pte));
+
+ i_mmap_lock_read(mapping);
+ mutex_lock(&hugetlb_fault_mutex_table[hash]);
spin_lock(ptl);
ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
if (likely(ptep &&

Next message: Greg Kroah-Hartman: "[PATCH 5.10 16/63] lib/zlib: fix inflating zlib streams on s390"
Previous message: Greg Kroah-Hartman: "[PATCH 5.10 01/63] net/sched: sch_taprio: reset child qdiscs before freeing them"
In reply to: Jakub Kicinski: "Re: [PATCH 5.10 01/63] net/sched: sch_taprio: reset child qdiscs before freeing them"
Next in thread: Greg Kroah-Hartman: "[PATCH 5.10 16/63] lib/zlib: fix inflating zlib streams on s390"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]