Re: [PATCH] mm: hugetlb: fix UAF in hugetlb_handle_userfault

From: David Hildenbrand
Date: Thu Sep 22 2022 - 03:47:04 EST


On 22.09.22 01:57, Mike Kravetz wrote:
On 09/21/22 10:48, Mike Kravetz wrote:
On 09/21/22 16:34, Liu Shixin wrote:
The vma_lock and hugetlb_fault_mutex are dropped before handling
userfault and reacquire them again after handle_userfault(), but
reacquire the vma_lock could lead to UAF[1] due to the following
race,

hugetlb_fault
hugetlb_no_page
/*unlock vma_lock */
hugetlb_handle_userfault
handle_userfault
/* unlock mm->mmap_lock*/
vm_mmap_pgoff
do_mmap
mmap_region
munmap_vma_range
/* clean old vma */
/* lock vma_lock again <--- UAF */
/* unlock vma_lock */

Since the vma_lock will unlock immediately after hugetlb_handle_userfault(),
let's drop the unneeded lock and unlock in hugetlb_handle_userfault() to fix
the issue.

Thank you very much!

When I saw this report, the obvious fix was to do something like what you have
done below. That looks fine with a few minor comments.

One question I have not yet answered is, "Does this same issue apply to
follow_hugetlb_page()?". I believe it does. follow_hugetlb_page calls
hugetlb_fault which could result in the fault being processed by userfaultfd.
If we experience the race above, then the associated vma could no longer be
valid when returning from hugetlb_fault. follow_hugetlb_page and callers
have a flag (locked) to deal with dropping mmap lock. However, I am not sure
if it is handled correctly WRT userfaultfd. I think this needs to be answered
before fixing. And, if the follow_hugetlb_page code needs to be fixed it
should be done at the same time.


To at least verify this code path, I added userfaultfd handling to the gup_test
program in kernel selftests. When doing basic gup test on a hugetlb page in
a userfaultfd registered range, I hit this warning:

[ 6939.867796] FAULT_FLAG_ALLOW_RETRY missing 1
[ 6939.871503] CPU: 2 PID: 5720 Comm: gup_test Not tainted 6.0.0-rc6-next-20220921+ #72
[ 6939.874562] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
[ 6939.877707] Call Trace:
[ 6939.878745] <TASK>
[ 6939.879779] dump_stack_lvl+0x6c/0x9f
[ 6939.881199] handle_userfault.cold+0x14/0x1e
[ 6939.882830] ? find_held_lock+0x2b/0x80
[ 6939.884370] ? __mutex_unlock_slowpath+0x45/0x280
[ 6939.886145] hugetlb_handle_userfault+0x90/0xf0
[ 6939.887936] hugetlb_fault+0xb7e/0xda0
[ 6939.889409] ? vprintk_emit+0x118/0x3a0
[ 6939.890903] ? _printk+0x58/0x73
[ 6939.892279] follow_hugetlb_page.cold+0x59/0x145
[ 6939.894116] __get_user_pages+0x146/0x750
[ 6939.895580] __gup_longterm_locked+0x3e9/0x680
[ 6939.897023] ? seqcount_lockdep_reader_access.constprop.0+0xa5/0xb0
[ 6939.898939] ? lockdep_hardirqs_on+0x7d/0x100
[ 6939.901243] gup_test_ioctl+0x320/0x6e0
[ 6939.902202] __x64_sys_ioctl+0x87/0xc0
[ 6939.903220] do_syscall_64+0x38/0x90
[ 6939.904233] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 6939.905423] RIP: 0033:0x7fbb53830f7b

This is because userfaultfd is expecting FAULT_FLAG_ALLOW_RETRY which is not
set in this path.

Right. Without being able to drop the mmap lock, we cannot continue. And we don't know if we can drop it without FAULT_FLAG_ALLOW_RETRY.

FAULT_FLAG_ALLOW_RETRY is only set when we can communicate to the caller that we dropped the mmap lock [e.g., int *locked parameter].

All code paths that pass NULL won't be able to handle -- especially surprisingly also pin_user_pages_fast() -- cannot trigger usefaultfd and will result in this warning.


A "sane" example is access via /proc/self/mem via ptrace: we don't want to trigger userfaultfd, but instead simply fail the GUP get/pin.


Now, this is just a printed *warning* (not a WARN/BUG/taint) that tells us that there is a GUP user that isn't prepared for userfaultfd. So it rather points out a missing GUP adaption -- incomplete userfaultfd support. And we seem to have plenty of that judging that pin_user_pages_fast_only().

Maybe the printed stack trace is a bit too much and makes this look very scary.


Adding John, Peter and David on Cc: as they are much more fluent in all the
fault and FOLL combinations and might have immediate suggestions. It is going
to take me a little while to figure out:
1) How to make sure we get the right flags passed to handle_userfault

This is a GUP caller problem -- or rather, how GUP has to deal with userfaultfd.

2) How to modify follow_hugetlb_page as userfaultfd can certainly drop
mmap_lock. So we can not assume vma still exists upon return.

Again, we have to communicate to the GUP caller that we dropped the mmap lock. And that requires GUP caller changes.

--
Thanks,

David / dhildenb