Re: a crash when running strace from persistent memory

From: Mikulas Patocka
Date: Fri Sep 04 2020 - 04:08:50 EST




On Thu, 3 Sep 2020, Linus Torvalds wrote:

> On Thu, Sep 3, 2020 at 12:24 PM Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
> >
> > There's a bug when you run strace from dax-based filesystem.
> >
> > -- create real or emulated persistent memory device (/dev/pmem0)
> > mkfs.ext2 /dev/pmem0
> > -- mount it
> > mount -t ext2 -o dax /dev/pmem0 /mnt/test
> > -- copy the system to it (well, you can copy just a few files that are
> > needed for running strace and ls)
> > cp -ax / /mnt/test
> > -- bind the system directories
> > mount --bind /dev /mnt/test/dev
> > mount --bind /proc /mnt/test/proc
> > mount --bind /sys /mnt/test/sys
> > -- run strace on the ls command
> > chroot /mnt/test/ strace /bin/ls
> >
> > You get this warning and ls is killed with SIGSEGV.
> >
> > I bisected the problem and it is caused by the commit
> > 17839856fd588f4ab6b789f482ed3ffd7c403e1f (gup: document and work around
> > "COW can break either way" issue). When I revert the patch (on the kernel
> > 5.9-rc3), the bug goes away.
>
> Funky. I really don't see how it could cause that, but we have the
> UDDF issue too, so I'm guessing I will have to fix it the radical way
> with Peter Xu's series based on my "rip out COW special cases" patch.
>
> Or maybe I'm just using that as an excuse for really wanting to apply
> that series.. Because we can't just revert that GUP commit due to
> security concerns.
>
> > [ 84.191504] WARNING: CPU: 6 PID: 1350 at mm/memory.c:2486 wp_page_copy.cold+0xdb/0xf6
>
> I'm assuming this is the WARN_ON_ONCE(1) on line 2482, and you have
> some extra debug patch that causes that line to be off by 4? Because
> at least for me, line 2486 is actually an empty line in v5.9-rc3.

Yes, that's it. I added a few printk to look at the control flow.

> That said, I really think this is a pre-existing race, and all the
> "COW can break either way" patch does is change the timing (presumably
> due to the actual pattern of actually doing the COW changing).
>
> See commit c3e5ea6ee574 ("mm: avoid data corruption on CoW fault into
> PFN-mapped VMA") for background.
>
> Mikulas, can you check that everything works ok for that case if you
> apply Peter's series? See
>
> https://lore.kernel.org/lkml/20200821234958.7896-1-peterx@xxxxxxxxxx/

I applied these four patches and strace works well. There is no longer any
warning or crash.

Mikulas

> or if you have 'b4' installed, use
>
> b4 am 20200821234958.7896-1-peterx@xxxxxxxxxx
>
> to get the series..
>
> Linus
>