On 09/21/22 10:48, Mike Kravetz wrote:
On 09/21/22 16:34, Liu Shixin wrote:
The vma_lock and hugetlb_fault_mutex are dropped before handling
userfault and reacquire them again after handle_userfault(), but
reacquire the vma_lock could lead to UAF[1] due to the following
race,
hugetlb_fault
hugetlb_no_page
/*unlock vma_lock */
hugetlb_handle_userfault
handle_userfault
/* unlock mm->mmap_lock*/
vm_mmap_pgoff
do_mmap
mmap_region
munmap_vma_range
/* clean old vma */
/* lock vma_lock again <--- UAF */
/* unlock vma_lock */
Since the vma_lock will unlock immediately after hugetlb_handle_userfault(),
let's drop the unneeded lock and unlock in hugetlb_handle_userfault() to fix
the issue.
Thank you very much!
When I saw this report, the obvious fix was to do something like what you have
done below. That looks fine with a few minor comments.
One question I have not yet answered is, "Does this same issue apply to
follow_hugetlb_page()?". I believe it does. follow_hugetlb_page calls
hugetlb_fault which could result in the fault being processed by userfaultfd.
If we experience the race above, then the associated vma could no longer be
valid when returning from hugetlb_fault. follow_hugetlb_page and callers
have a flag (locked) to deal with dropping mmap lock. However, I am not sure
if it is handled correctly WRT userfaultfd. I think this needs to be answered
before fixing. And, if the follow_hugetlb_page code needs to be fixed it
should be done at the same time.
To at least verify this code path, I added userfaultfd handling to the gup_test
program in kernel selftests. When doing basic gup test on a hugetlb page in
a userfaultfd registered range, I hit this warning:
[ 6939.867796] FAULT_FLAG_ALLOW_RETRY missing 1
[ 6939.871503] CPU: 2 PID: 5720 Comm: gup_test Not tainted 6.0.0-rc6-next-20220921+ #72
[ 6939.874562] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
[ 6939.877707] Call Trace:
[ 6939.878745] <TASK>
[ 6939.879779] dump_stack_lvl+0x6c/0x9f
[ 6939.881199] handle_userfault.cold+0x14/0x1e
[ 6939.882830] ? find_held_lock+0x2b/0x80
[ 6939.884370] ? __mutex_unlock_slowpath+0x45/0x280
[ 6939.886145] hugetlb_handle_userfault+0x90/0xf0
[ 6939.887936] hugetlb_fault+0xb7e/0xda0
[ 6939.889409] ? vprintk_emit+0x118/0x3a0
[ 6939.890903] ? _printk+0x58/0x73
[ 6939.892279] follow_hugetlb_page.cold+0x59/0x145
[ 6939.894116] __get_user_pages+0x146/0x750
[ 6939.895580] __gup_longterm_locked+0x3e9/0x680
[ 6939.897023] ? seqcount_lockdep_reader_access.constprop.0+0xa5/0xb0
[ 6939.898939] ? lockdep_hardirqs_on+0x7d/0x100
[ 6939.901243] gup_test_ioctl+0x320/0x6e0
[ 6939.902202] __x64_sys_ioctl+0x87/0xc0
[ 6939.903220] do_syscall_64+0x38/0x90
[ 6939.904233] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 6939.905423] RIP: 0033:0x7fbb53830f7b
This is because userfaultfd is expecting FAULT_FLAG_ALLOW_RETRY which is not
set in this path.
Adding John, Peter and David on Cc: as they are much more fluent in all the
fault and FOLL combinations and might have immediate suggestions. It is going
to take me a little while to figure out:
1) How to make sure we get the right flags passed to handle_userfault
2) How to modify follow_hugetlb_page as userfaultfd can certainly drop
mmap_lock. So we can not assume vma still exists upon return.