Re: [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region()

From: Lorenzo Stoakes
Date: Wed Oct 02 2024 - 14:29:27 EST


On Wed, Oct 02, 2024 at 06:19:18PM GMT, Lorenzo Stoakes wrote:

[snip]

>
> Current status - I litearlly cannot repro this even doing exactly what you're
> doing, so I wonder if your exact GPU or a file system you're using or something
> is a factor here and there's something which implements a custom .mmap callback
> or vm_ops->close() that is somehow interfacing with this, or if this being a
> file thing is somehow a factor.
>
> Recreating the scenario as best I can with anon mappings, it seems immediately
> before it triggers we are in the position on the left in the range with the
> problematic node, and then immediately after we are in the right (plus an
> invalid entry/pivot for 0x68000000).
>
> The final action that triggers the problem is mapping [0x1b90000, 0x1bae000)
> PROT_NONE, MAP_RESERVE which merges with A and D, and we unmap B and C:
>
> 01740000-017c0000 ---p 00000000 00:00 0 01740000-017c0000 ---p 00000000 00:00 0
> 017c0000-01b40000 rw-p 00000000 00:00 0 017c0000-01b40000 rw-p 00000000 00:00 0
> 01b40000-01b50000 ---p 00000000 00:00 0 01b40000-01b50000 ---p 00000000 00:00 0
> 01b50000-01b56000 rw-p 00000000 00:00 0 01b50000-01b56000 rw-p 00000000 00:00 0
> 01b56000-01b60000 ---p 00000000 00:00 0 01b56000-01b60000 ---p 00000000 00:00 0
> 01b60000-01b70000 ---p 00000000 00:00 0 01b60000-01b70000 ---p 00000000 00:00 0
> 01b70000-01b80000 ---p 00000000 00:00 0 01b70000-01b80000 ---p 00000000 00:00 0
> 01b80000-01b86000 rw-p 00000000 00:00 0 01b80000-01b86000 rw-p 00000000 00:00 0
> 01b86000-01b90000 ---p 00000000 00:00 0 * A 01b86000-68000000 ---p 00000000 00:00 0
> 01b90000-01b91000 rwxp 00000000 00:00 0 * B < invalid 0x68000000 entry/pivot >
> 01b91000-01bae000 rw-p 00000000 00:00 0 * C
> 01bae000-68000000 ---p 00000000 00:00 0 * D
>
> It seems based on some of the VMA flags that we _must_ be mapping files here,
> e.g. some have VM_EXEC and others are mising VM_MAYREAD which indicates a
> read-only file mapping. Probably given low addresses we are setting up a binary
> set of mappings or such? Would align with PROT_NONE mappings also.
>
> This really makes me think, combined with the fact I really _cannot_ repro this
> (on intel GPU hardware and ext4 file system) that there are some 'special'
> mappings going on here.
>
> The fact we're unmapping 2 VMAs and then removing a final one in a merge does
> suggest something is going wrong in the interaction between these two events.
>
> I wonder if the merge logic is possibly struggling with the (detached but
> present) VMAs still being there as we try to expand an existing VMA?
>
> Though my money's on a call_mmap() or .close() call doing something weird here.
>
> Investigation carries on...

Hey Bert - sorry to be a pain, but try as I might I cannot repro this.

I've attached a quite thorough hacky printk patch here, it's going to
generate a ton of noise, so I really think this one has to be a link to an
off-list dmesg or we're going to break lei again, sorry Andrew.

If you could repro with this patch applied + the usual debug config
settings and send it back I'd appreciate it!

This should hopefully eek out a little more information to help figure
things out.

Also if you could share your .config, ulimit -a and
/proc/sys/vm/max_map_count that'd be great too, thanks!

Again, much much appreciated.

Cheers, Lorenzo

----8<----