Very slow unlockall()

From: Milan Broz
Date: Wed Jan 06 2021 - 10:21:00 EST


Hi,

we use mlockall(MCL_CURRENT | MCL_FUTURE) / munlockall() in cryptsetup code
and someone tried to use it with hardened memory allocator library.

Execution time was increased to extreme (minutes) and as we found, the problem
is in munlockall().

Here is a plain reproducer for the core without any external code - it takes
unlocking on Fedora rawhide kernel more than 30 seconds!
I can reproduce it on 5.10 kernels and Linus' git.

The reproducer below tries to mmap large amount memory with PROT_NONE (later never used).
The real code of course does something more useful but the problem is the same.

#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mman.h>

int main (int argc, char *argv[])
{
void *p = mmap(NULL, 1UL << 41, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);

if (p == MAP_FAILED) return 1;

if (mlockall(MCL_CURRENT | MCL_FUTURE)) return 1;
printf("locked\n");

if (munlockall()) return 1;
printf("unlocked\n");

return 0;
}

In traceback I see that time is spent in munlock_vma_pages_range.

[ 2962.006813] Call Trace:
[ 2962.006814] ? munlock_vma_pages_range+0xe7/0x4b0
[ 2962.006814] ? vma_merge+0xf3/0x3c0
[ 2962.006815] ? mlock_fixup+0x111/0x190
[ 2962.006815] ? apply_mlockall_flags+0xa7/0x110
[ 2962.006816] ? __do_sys_munlockall+0x2e/0x60
[ 2962.006816] ? do_syscall_64+0x33/0x40
...

Or with perf, I see

# Overhead Command Shared Object Symbol
# ........ ....... ................. .....................................
#
48.18% lock [kernel.kallsyms] [k] lock_is_held_type
11.67% lock [kernel.kallsyms] [k] ___might_sleep
10.65% lock [kernel.kallsyms] [k] follow_page_mask
9.17% lock [kernel.kallsyms] [k] debug_lockdep_rcu_enabled
6.73% lock [kernel.kallsyms] [k] munlock_vma_pages_range
...


Could please anyone check what's wrong here with the memory locking code?
Running it on my notebook I can effectively DoS the system :)

Original report is https://gitlab.com/cryptsetup/cryptsetup/-/issues/617
but this is apparently a kernel issue, just amplified by usage of munlockall().

Thanks,
Milan