Re: [PATCH] mm: init_mlocked_on_free_v3

From: David Hildenbrand
Date: Mon Jun 03 2024 - 04:38:01 EST


On 01.06.24 16:09, Lance Yang wrote:
Completely agree with David's point[1]. I'm also not convinced that this is the
right approach :)

It seems like this patch won't handle all cases, as David mentioned[1] before.
folio_remove_rmap_ptes() will immediately munlock a large folio (as large folios
are not allowed to be batch-added to the LRU list) via munlock_vma_folio() when
it is fully unmapped, so this patch does not work in this case. Even worse, if
we encounter a COW mlocked folio, we would run into trouble (data corruption).

Adding to what Lance said regarding COW, here is a simple reproducer:


[root@localhost ~]# cat mlock.c
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/wait.h>

int main(void)
{
size_t size = getpagesize();
volatile char *mem;
int rc;

mem = mmap(0, size, PROT_READ | PROT_WRITE, MAP_ANON | MAP_PRIVATE, -1, 0);
if (mem == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");
return -1;
}

rc = mlock((char *) mem, size);
if (rc) {
fprintf(stderr, "mlock() failed\n");
return -1;
}

memset((char *) mem, 1, size);

rc = fork();
if (rc < 0) {
fprintf(stderr, "fork() failed\n");
return -1;
} else if (!rc) {
return 0;
}

waitpid(rc, NULL, 0);
sleep(2);

if (mem[0] != 1) {
fprintf(stderr, "[FAIL] Memory content changed!\n");
return -1;
}
fprintf(stderr, "[PASS] Memory content did not change!\n");

return 0;
}

[root@localhost ~]# ./mlock
[FAIL] Memory content changed!

In contrast, when it's not enabled on the cmdline:

[root@localhost ~]# ./mlock
[PASS] Memory content did not change!


Further, rebooting with this enabled gives me (retried once to make sure enabling/disabling
this feature makes a difference):

[ 105.057230] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b
[ 105.059491] CPU: 29 PID: 1 Comm: systemd-shutdow Not tainted 6.10.0-rc1+ #16
[ 105.061545] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[ 105.063884] Call Trace:
[ 105.064548] <TASK>
[ 105.065088] dump_stack_lvl+0x5d/0x80
[ 105.066036] panic+0x118/0x2c8
[ 105.066837] do_exit.cold+0x18/0x18
[ 105.067759] do_group_exit+0x36/0xa0
[ 105.069177] get_signal+0x9bc/0x9e0
[ 105.070191] arch_do_signal_or_restart+0x3b/0x240
[ 105.071538] irqentry_exit_to_user_mode+0x1c2/0x230
[ 105.072913] asm_exc_page_fault+0x26/0x30
[ 105.074057] RIP: 0033:0x0
[ 105.074803] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[ 105.076647] RSP: 002b:00007fff0d9a41a8 EFLAGS: 00010206
[ 105.078090] RAX: 0000000000000011 RBX: 0000000006d8cef9 RCX: 0000000000000005
[ 105.080046] RDX: 00007fff0d9a4250 RSI: 00000000000008c7 RDI: 0000000000000001
[ 105.082139] RBP: 00007fff0d9a4310 R08: 0019fa9fcd800000 R09: 00000029e0b9a6f0
[ 105.084038] R10: 0000000000000008 R11: 0000000000000202 R12: 00007fff0d9a4250
[ 105.085852] R13: 00007fff0d9a41d0 R14: 00000000000008c7 R15: 0000000000000000
[ 105.087653] </TASK>
[ 105.090686] Kernel Offset: 0x1e000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 105.093359] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000008b ]---


This needs to be reverted.

@Andrew, to you want us to send an official revert patch?

--
Cheers,

David / dhildenb