Re: [syzbot] [mm?] WARNING in page_counter_cancel (5)

From: Rustam Kovhaev
Date: Sat Oct 05 2024 - 13:13:00 EST


On Fri, Oct 04, 2024 at 08:31:28AM -0700, syzbot wrote:
> ------------[ cut here ]------------
> page_counter underflow: -512 nr_pages=512
> WARNING: CPU: 1 PID: 5225 at mm/page_counter.c:60 page_counter_cancel+0x110/0x170 mm/page_counter.c:60
> Modules linked in:
> CPU: 1 UID: 0 PID: 5225 Comm: syz-executor334 Not tainted 6.12.0-rc1-syzkaller #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
> RIP: 0010:page_counter_cancel+0x110/0x170 mm/page_counter.c:60
> Code: e8 55 23 98 ff 45 84 ed 75 24 e8 6b 21 98 ff c6 05 1a ef 10 0e 01 90 48 c7 c7 c0 9d 5c 8b 4c 89 e2 48 89 ee e8 91 9a 59 ff 90 <0f> 0b 90 90 e8 47 21 98 ff be 08 00 00 00 48 89 df e8 9a 71 f9 ff
> RSP: 0018:ffffc900032dfae8 EFLAGS: 00010282
> RAX: 0000000000000000 RBX: ffff8881404a9440 RCX: ffffffff814e2a49
> RDX: ffff88801df38000 RSI: ffffffff814e2a56 RDI: 0000000000000001
> RBP: fffffffffffffe00 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000200
> R13: 0000000000000000 R14: 0000000000000001 R15: ffff888077bbdc18
> FS: 0000000000000000(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f6b788f5243 CR3: 000000007ec10000 CR4: 00000000003526f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> page_counter_uncharge+0x32/0x70 mm/page_counter.c:182
> hugetlb_cgroup_uncharge_counter+0xd6/0x410 mm/hugetlb_cgroup.c:431
> hugetlb_vm_op_close+0x3fe/0x5b0 mm/hugetlb.c:5065
> remove_vma+0xa8/0x1a0 mm/vma.c:330
> exit_mmap+0x4e0/0xb30 mm/mmap.c:1888
> __mmput+0x12a/0x480 kernel/fork.c:1347
> mmput+0x62/0x70 kernel/fork.c:1369
> exit_mm kernel/exit.c:571 [inline]
> do_exit+0x9bf/0x2d70 kernel/exit.c:926
> do_group_exit+0xd3/0x2a0 kernel/exit.c:1088
> __do_sys_exit_group kernel/exit.c:1099 [inline]
> __se_sys_exit_group kernel/exit.c:1097 [inline]
> __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1097
> x64_sys_call+0x14a9/0x16a0 arch/x86/include/generated/asm/syscalls_64.h:232
> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f6b7889d879
> Code: Unable to access opcode bytes at 0x7f6b7889d84f.
> RSP: 002b:00007ffcea637828 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6b7889d879
> RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
> RBP: 00007f6b78911350 R08: ffffffffffffffb8 R09: 0000000000000000
> R10: 0000000000000003 R11: 0000000000000246 R12: 00007f6b78911350
> R13: 0000000000000000 R14: 00007f6b78911da0 R15: 00007f6b78866f40
> </TASK>
>
Hello,
Reproduced the same issue in my lab. I'll try to take my chances and fix
this one, unless someone is already working on it.
In copy_vma() we go to out_vma_link and execute hugetlb_vm_op_close()
which uncharges the counter to 0.
Then, when the process terminates we execute hugetlb_vm_op_close() again
against the same vma and the counter goes negative.