Re: [PATCH v8 14/21] mm/mmap: Avoid zeroing vma tree in mmap_region()
From: Bert Karwatzki
Date: Wed Oct 02 2024 - 16:07:03 EST
Am Mittwoch, dem 02.10.2024 um 19:28 +0100 schrieb Lorenzo Stoakes:
> On Wed, Oct 02, 2024 at 06:19:18PM GMT, Lorenzo Stoakes wrote:
>
> [snip]
>
> >
> > Current status - I litearlly cannot repro this even doing exactly what you're
> > doing, so I wonder if your exact GPU or a file system you're using or something
> > is a factor here and there's something which implements a custom .mmap callback
> > or vm_ops->close() that is somehow interfacing with this, or if this being a
> > file thing is somehow a factor.
> >
> > Recreating the scenario as best I can with anon mappings, it seems immediately
> > before it triggers we are in the position on the left in the range with the
> > problematic node, and then immediately after we are in the right (plus an
> > invalid entry/pivot for 0x68000000).
> >
> > The final action that triggers the problem is mapping [0x1b90000, 0x1bae000)
> > PROT_NONE, MAP_RESERVE which merges with A and D, and we unmap B and C:
> >
> > 01740000-017c0000 ---p 00000000 00:00 0 01740000-017c0000 ---p 00000000 00:00 0
> > 017c0000-01b40000 rw-p 00000000 00:00 0 017c0000-01b40000 rw-p 00000000 00:00 0
> > 01b40000-01b50000 ---p 00000000 00:00 0 01b40000-01b50000 ---p 00000000 00:00 0
> > 01b50000-01b56000 rw-p 00000000 00:00 0 01b50000-01b56000 rw-p 00000000 00:00 0
> > 01b56000-01b60000 ---p 00000000 00:00 0 01b56000-01b60000 ---p 00000000 00:00 0
> > 01b60000-01b70000 ---p 00000000 00:00 0 01b60000-01b70000 ---p 00000000 00:00 0
> > 01b70000-01b80000 ---p 00000000 00:00 0 01b70000-01b80000 ---p 00000000 00:00 0
> > 01b80000-01b86000 rw-p 00000000 00:00 0 01b80000-01b86000 rw-p 00000000 00:00 0
> > 01b86000-01b90000 ---p 00000000 00:00 0 * A 01b86000-68000000 ---p 00000000 00:00 0
> > 01b90000-01b91000 rwxp 00000000 00:00 0 * B < invalid 0x68000000 entry/pivot >
> > 01b91000-01bae000 rw-p 00000000 00:00 0 * C
> > 01bae000-68000000 ---p 00000000 00:00 0 * D
> >
> > It seems based on some of the VMA flags that we _must_ be mapping files here,
> > e.g. some have VM_EXEC and others are mising VM_MAYREAD which indicates a
> > read-only file mapping. Probably given low addresses we are setting up a binary
> > set of mappings or such? Would align with PROT_NONE mappings also.
> >
> > This really makes me think, combined with the fact I really _cannot_ repro this
> > (on intel GPU hardware and ext4 file system) that there are some 'special'
> > mappings going on here.
> >
> > The fact we're unmapping 2 VMAs and then removing a final one in a merge does
> > suggest something is going wrong in the interaction between these two events.
> >
> > I wonder if the merge logic is possibly struggling with the (detached but
> > present) VMAs still being there as we try to expand an existing VMA?
> >
> > Though my money's on a call_mmap() or .close() call doing something weird here.
> >
> > Investigation carries on...
>
> Hey Bert - sorry to be a pain, but try as I might I cannot repro this.
>
> I've attached a quite thorough hacky printk patch here, it's going to
> generate a ton of noise, so I really think this one has to be a link to an
> off-list dmesg or we're going to break lei again, sorry Andrew.
>
> If you could repro with this patch applied + the usual debug config
> settings and send it back I'd appreciate it!
>
> This should hopefully eek out a little more information to help figure
> things out.
>
> Also if you could share your .config, ulimit -a and
> /proc/sys/vm/max_map_count that'd be great too, thanks!
>
> Again, much much appreciated.
>
> Cheers, Lorenzo
>
> ----8<----
> From d85fb5d2fd096e84681bdb6da8b5d37f0464ff84 Mon Sep 17 00:00:00 2001
> From: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx>
> Date: Wed, 2 Oct 2024 09:19:28 +0100
> Subject: [PATCH] hack: mm: see if we can get some more information
>
> Add some dreadful printk() hacks so we can try to get some more information
> on what's going on.
> ---
> mm/internal.h | 15 ++++++++++++
> mm/mmap.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++-
> mm/vma.c | 34 ++++++++++++++++++++++++--
> 3 files changed, 112 insertions(+), 3 deletions(-)
>
> diff --git a/mm/internal.h b/mm/internal.h
> index 93083bbeeefa..cd9414b4651d 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -1443,4 +1443,19 @@ static inline void accept_page(struct page *page)
> }
> #endif /* CONFIG_UNACCEPTED_MEMORY */
>
> +static inline bool check_interesting(unsigned long start, unsigned long end)
> +{
> + const unsigned long interesting_start = 0x1740000;
> + /* Include off-by-one on purpose.*/
> + const unsigned long interesting_end = 0x68000000 + 1;
In an earlier patch you requested this to be changed to 0x798b1000, is this
correct?
> +
> + /* interesting_start interesting_end
> + * |--------------------------|
> + * ============================> end
> + * <============================= start
> + */
> + return end > interesting_start && /* after or overlaps... */
> + start < interesting_end; /* ...overlaps. */
> +}
> +
> #endif /* __MM_INTERNAL_H */
> diff --git a/mm/mmap.c b/mm/mmap.c
> index dd4b35a25aeb..8a1d5c0da86f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1341,6 +1341,18 @@ struct vm_area_struct *expand_stack(struct mm_struct *mm, unsigned long addr)
> return vma;
> }
>
> +static void ljs_dump(struct mm_struct *mm,
> + unsigned long addr, unsigned long len,
> + vm_flags_t vm_flags, bool is_unmap)
> +{
> + if (!check_interesting(addr, addr + len))
> + return;
> +
> + pr_err("LJS: %s mm=%p [0x%lx, 0x%lx) [vm_flags=%lu]\n",
> + is_unmap ? "munmap" : "mmap", mm, addr, addr + len,
> + vm_flags);
> +}
> +
> /* do_munmap() - Wrapper function for non-maple tree aware do_munmap() calls.
> * @mm: The mm_struct
> * @start: The start address to munmap
> @@ -1354,6 +1366,8 @@ int do_munmap(struct mm_struct *mm, unsigned long start, size_t len,
> {
> VMA_ITERATOR(vmi, mm, start);
>
> + ljs_dump(mm, start, len, 0, true);
> +
> return do_vmi_munmap(&vmi, mm, start, len, uf, false);
> }
>
> @@ -1375,6 +1389,8 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> VMA_ITERATOR(vmi, mm, addr);
> VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff);
>
> + ljs_dump(mm, addr, len, vm_flags, false);
> +
> vmg.file = file;
> /* Find the first overlapping VMA */
> vma = vma_find(&vmi, end);
> @@ -1390,6 +1406,12 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>
> vmg.next = vms.next;
> vmg.prev = vms.prev;
> +
> + if (check_interesting(addr, addr + len))
> + pr_err("LJS: prev=[%lx, %lx), next=[%lx, %lx)\n",
> + vmg.prev ? vmg.prev->vm_start : 0, vmg.prev ? vmg.prev->vm_end : 0,
> + vmg.next ? vmg.next->vm_start : 0, vmg.next ? vmg.next->vm_end : 0);
> +
> vma = NULL;
> } else {
> vmg.next = vma_iter_next_rewind(&vmi, &vmg.prev);
> @@ -1413,9 +1435,29 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> vmg.flags = vm_flags;
> }
>
> + if (check_interesting(addr, addr + len)) {
> + char *special = vm_flags & VM_SPECIAL ? "special" : "";
> + char *has_file = file ? "file-backed" : "";
> +
> + pr_err("LJS: Interesting [%lx, %lx) flags=%lu, special=[%s] file=[%s]\n",
> + addr, addr+len, vm_flags, special, has_file);
> + }
> +
> vma = vma_merge_new_range(&vmg);
> - if (vma)
> + if (vma) {
> + if (check_interesting(addr, addr + len)) {
> + pr_err("LJS: Merged to [%lx, %lx), addr=%lx, end=%lx\n",
> + vma->vm_start, vma->vm_end, vma_iter_addr(&vmi),
> + vma_iter_end(&vmi));
> +
> + mt_validate(&mm->mm_mt);
> + }
> +
> goto expanded;
> + } else if (check_interesting(addr, addr + len)) {
> + pr_err("LJS: Failed to merge [%lx, %lx), reset...\n",
> + addr, addr + len);
> + }
> /*
> * Determine the object being mapped and call the appropriate
> * specific mapper. the address has already been validated, but
> @@ -1441,6 +1483,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> if (error)
> goto unmap_and_free_vma;
>
> + if (check_interesting(addr, addr + len)) {
> + pr_err("LJS: call_mmap() on [%lx, %lx) old_flags=%lu new_flags=%lu new range=[%lx, %lx)\n",
> + addr, addr + end, vm_flags, vma->vm_flags, vma->vm_start, vma->vm_end);
> + }
> +
> if (vma_is_shared_maywrite(vma)) {
> error = mapping_map_writable(file->f_mapping);
> if (error)
> @@ -1467,6 +1514,11 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> /* If this fails, state is reset ready for a reattempt. */
> merge = vma_merge_new_range(&vmg);
>
> + if (check_interesting(addr, addr + len))
> + pr_err("LJS: flags changed for [%lx, %lx) from %lu to %lu %s",
> + vma->vm_start, vma->vm_end, vm_flags, vma->vm_flags,
> + merge ? "merged" : "");
> +
> if (merge) {
> /*
> * ->mmap() can change vma->vm_file and fput
> @@ -1510,10 +1562,18 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
>
> /* Lock the VMA since it is modified after insertion into VMA tree */
> vma_start_write(vma);
> +
> + if (check_interesting(addr, addr + len))
> + pr_err("LJS: mm=%p: iter store addr=%lx, end=%lx, vma=[%lx, %lx)\n",
> + mm, vma_iter_addr(&vmi), vma_iter_end(&vmi), vma->vm_start, vma->vm_end);
> +
> vma_iter_store(&vmi, vma);
> mm->map_count++;
> vma_link_file(vma);
>
> + if (check_interesting(addr, addr + len))
> + mt_validate(&mm->mm_mt);
> +
> /*
> * vma_merge_new_range() calls khugepaged_enter_vma() too, the below
> * call covers the non-merge case.
> @@ -1530,6 +1590,10 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
> perf_event_mmap(vma);
>
> /* Unmap any existing mapping in the area */
> +
> + if (check_interesting(addr, addr + len))
> + mt_validate(&mm->mm_mt);
> +
> vms_complete_munmap_vmas(&vms, &mas_detach);
>
> vm_stat_account(mm, vm_flags, pglen);
> diff --git a/mm/vma.c b/mm/vma.c
> index 4737afcb064c..33f80e82704b 100644
> --- a/mm/vma.c
> +++ b/mm/vma.c
> @@ -1108,8 +1108,13 @@ void vms_clean_up_area(struct vma_munmap_struct *vms,
> vms_clear_ptes(vms, mas_detach, true);
> mas_set(mas_detach, 0);
> mas_for_each(mas_detach, vma, ULONG_MAX)
> - if (vma->vm_ops && vma->vm_ops->close)
> + if (vma->vm_ops && vma->vm_ops->close) {
> + if (check_interesting(vma->vm_start, vma->vm_end))
> + pr_err("LJS: mm=%p Closing [%lx, %lx)\n",
> + vma->vm_mm, vma->vm_start, vma->vm_end);
> +
> vma->vm_ops->close(vma);
> + }
> vms->closed_vm_ops = true;
> }
>
> @@ -1179,6 +1184,10 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
> struct vm_area_struct *next = NULL;
> int error;
>
> + if (check_interesting(vms->vma->vm_start, vms->vma->vm_end))
> + pr_err("LJS2 vms->start=%lx, vms->vma->vm_start=%lx\n",
> + vms->start, vms->vma->vm_start);
> +
> /*
> * If we need to split any vma, do it now to save pain later.
> * Does it split the first one?
> @@ -1202,6 +1211,11 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
> goto start_split_failed;
> }
>
> + if (check_interesting(vms->vma->vm_start, vms->vma->vm_end))
> + pr_err("LJS: mm=%p vms=[%lx, %lx) split START of [%lx, %lx)\n",
> + vms->vma->vm_mm, vms->start, vms->end,
> + vms->vma->vm_start, vms->vma->vm_end);
> +
> error = __split_vma(vms->vmi, vms->vma, vms->start, 1);
> if (error)
> goto start_split_failed;
> @@ -1217,12 +1231,23 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
> for_each_vma_range(*(vms->vmi), next, vms->end) {
> long nrpages;
>
> + if (check_interesting(vms->vma->vm_start, vms->vma->vm_end))
> + pr_err("LJS: mm=%p vms=[%lx, %lx) UNMAP [%lx, %lx)\n",
> + vms->vma->vm_mm, vms->start, vms->end,
> + next->vm_start, next->vm_end);
> +
> if (!can_modify_vma(next)) {
> error = -EPERM;
> goto modify_vma_failed;
> }
> /* Does it split the end? */
> if (next->vm_end > vms->end) {
> +
> + if (check_interesting(next->vm_start, next->vm_end))
> + pr_err("LJS: mm=%p vms=[%lx, %lx) split END of [%lx, %lx)\n",
> + next->vm_mm, vms->start, vms->end,
> + next->vm_start, next->vm_end);
> +
> error = __split_vma(vms->vmi, next, vms->end, 0);
> if (error)
> goto end_split_failed;
> @@ -1295,9 +1320,14 @@ int vms_gather_munmap_vmas(struct vma_munmap_struct *vms,
> }
> #endif
>
> - while (vma_iter_addr(vms->vmi) > vms->start)
> + while (vma_iter_addr(vms->vmi) > vms->start) {
> vma_iter_prev_range(vms->vmi);
>
> + if (check_interesting(vms->vma->vm_start, vms->vma->vm_end))
> + pr_err("LJS3: addr=%lx, vms->start=%lx\n",
> + vma_iter_addr(vms->vmi), vms->start);
> + }
> +
> vms->clear_ptes = true;
> return 0;
>
> --
> 2.46.2
I just tested the "hunch" commit (without this patch) and it crashed in the same
way. Here are more detailed instructions of how I create the crash:
The game used is Rogue Heroes: Ruins of Tasos (which is basically Zelda). The
game itself does not work anymore (even on unaffected kernel versions), it has
been crashing with a
Unhandled Exception:
Microsoft.Xna.Framework.Graphics.NoSuitableGraphicsDeviceException: Failed to
create graphics device! ---> System.TypeInitializationException: The type
initializer for 'Microsoft.Xna.Framework.Graphics.GraphicsAdapter' threw an
exception. ---> SharpDX.SharpDXException: HRESULT: [0x80004005], Module:
[General], ApiCode: [E_FAIL/Unspecified error], Message: Call failed.
error for sometime (probably a year).
1. Go to Properties->Compatibility and select "Force the use of specific Steam
Play compatibility tool" and the select Proton 7.0-6
2. start the game, the game should then crash with the Xna error above
3. Go to Properties->Compatibility and unselect "Force the use of specific Steam
Play compatibility tool"
4. start the game again, this will usually give the vma error (on two occasions
so far the whole procedure (1-4) had to be done twice to get the error.
Bert Karwatzki