Re: [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region()
From: Lorenzo Stoakes
Date: Tue Oct 01 2024 - 14:02:13 EST
On Tue, Oct 01, 2024 at 06:43:35PM GMT, Bert Karwatzki wrote:
[snip]
> I applied this patch to linux-next-20240110 (it applied cleany) and got the same
> error again (Andrew Morton asked on bugzilla me to put the logs into mails):
OK, just thought the chunky old mega logs might be a problem for lore but
obviously good to have it all stored for everyone to see also :) If
Andrew's asking then definitely the way to go.
And ugh sigh yeah maybe a long shot that patch, but was a good thing to fix
anyway!
Right time for me to roll up my sleeves and dig into the maple tree state
and figure out what the hell is going on here.
Will dig in further and get back to you.
Thanks again for all your help!
Some basic first notes on the report:
[snip]
> [ T4555] node00000000cba76266: data_end 9 != the last slot offset 8
Same exact bug, same exact data_end and slot indexes which is actually
pretty handy...
Anyway we're off-by-one here clearly.
> [ T4555] BUG at mas_validate_limits:7509 (1)
> [ T4555] maple_tree(00000000cda835e1) flags 313, height 4 root 000000001ff0b07a
[snip]
> [ T4555] 1740000-67ffffff: node 00000000cba76266 depth 3 type 1 parent
same exact mapping range too. Again handy for debugging.
Type 1 so maple_leaf_64. No parent node.
> 00000000c9eae6e1 contents: 000000006be89277 17BFFFF 00000000bb01c9f7 1B3FFFF
> 00000000fd36058b 1B4FFFF 00000000891e81bb 1B55FFF 000000007f0c8f3f 1B5FFFF
> 0000000043f46074 1B6FFFF 00000000bf6f5946 1B7FFFF 0000000084faee8c 1B85FFF
> 0000000087868a7c 67FFFFFF 00000000af00822b 67FFFFFF 0000000000000000 0
Hm duplicate entries at slots 8 and 9, which aligns with the other errors...
> 0000000000000000 0 0000000000000000 0 0000000000000000 0 0000000000000000 0
> 00000000686521f0
> [ T4555] Pass: 786885051 Run:786885052
> [ T4555] CPU: 7 UID: 1000 PID: 4555 Comm: rundll32.exe Not tainted 6.12.0-rc1-
> next-20241001-mapletreedebug-00001-g7e3bb072761a #542
> [ T4555] Hardware name: Micro-Star International Co., Ltd. Alpha 15 B5EEK/MS-
> 158L, BIOS E158LAMS.107 11/10/2021
> [ T4555] Call Trace:
> [ T4555] <TASK>
> [ T4555] dump_stack_lvl+0x58/0x90
> [ T4555] mt_validate+0xc64/0xc80
> [ T4555] validate_mm+0x49/0x150
> [ T4555] vms_complete_munmap_vmas+0x143/0x200
...but needed to unmap some stuff first, so it's MAP_FIXED over some
existing VMA.
Clearly something goes wrong (insightful I know). It's possible something
went wrong before, but unlikely as we should trigger a validate_mm() at any
point at which we fiddle with the maple tree.
> [ T4555] mmap_region+0x2ec/0xc30
Started to mmap()...
> [ T4555] ? sched_balance_newidle.isra.0+0x251/0x3f0
> [ T4555] do_mmap+0x463/0x640
> [ T4555] vm_mmap_pgoff+0xd4/0x150
> [ T4555] do_int80_emulation+0x88/0x140
> [ T4555] asm_int80_emulation+0x1a/0x20
32-bit :)))
> [ T4555] RIP: 0023:0xf7fb9bc2
> [ T4555] Code: 90 66 90 66 90 66 90 66 90 66 90 66 90 66 90 66 90 66 90 66 90 66
> 90 66 90 66 90 66 90 66 90 66 90 66 90 66 90 66 90 90 cd 80 <c3> 2e 8d b4 26 00
> 00 00 00 2e 8d 74 26 00 8b 1c 24 c3 2e 8d b4 26
> [ T4555] RSP: 002b:000000000050fa9c EFLAGS: 00000256 ORIG_RAX: 00000000000000c0
> [ T4555] RAX: ffffffffffffffda RBX: 0000000001b90000 RCX: 000000000001e000
> [ T4555] RDX: 0000000000000000 RSI: 0000000000004032 RDI: 00000000ffffffff
> [ T4555] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> [ T4555] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> [ T4555] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [ T4555] </TASK>
> [ T4555] 00000000cba76266[9] should not have entry 00000000af00822b
> [ T4555] BUG at mas_validate_limits:7518 (1)
Off-by one so we're probably looking at something that we really shouldn't
be here.
It should still be within the slots range though, as on 32-bit we have 63
MAPLE_NODE_SLOTS, so we're not buffer overflowing anywhere, just got a bad
index.
[snip]
> [ T4555] 00000000cba76266[9] should not have piv 1744830463
> [ T4555] WARN at mas_validate_limits:7529 (1)
Same deal here I think.
[snip]
> [ T4555] MAS: tree=00000000cda835e1 enode=000000002cb71521
> [ T4555] (ma_active)
> [ T4555] Store Type:
> [ T4555] invalid store type
> [ T4555] [5/9] index=0 last=0
Again more indications of off-by-one...
[snip]
Investigation continues...