Re: 3.15-rc8 oops in copy_page_rep after page fault.
From: Hugh Dickins
Date: Fri Jun 06 2014 - 14:42:17 EST
On Fri, 6 Jun 2014, Linus Torvalds wrote:
> On Fri, Jun 6, 2014 at 10:43 AM, Dave Jones <davej@xxxxxxxxxx> wrote:
> >
> > RIP: 0010:[<ffffffff8b3287b5>] [<ffffffff8b3287b5>] copy_page_rep+0x5/0x10
>
> Ok, it's the first iteration of "rep movsq" (%rcx is still 0x200) for
> copying a page, and the pages are
>
> RSI: ffff880052766000
> RDI: ffff880014efe000
>
> which both look like reasonable kernel addresses. So I'm assuming it's
> DEBUG_PAGEALLOC that makes this trigger, and since the error code is
> 0, and the CR2 value matches RSI, it's the source page that seems to
> have been freed.
>
> And I see absolutely _zero_ reason for wht your 64k mmap_min_addr
> should make any difference what-so-ever. That's just odd.
>
> Anyway, can you try to figure out _which_ copy_user_highpage() it is
> (by looking at what is around the call-site at
> "handle_mm_fault+0x1e0". The fact that we have a stale
> do_huge_pmd_wp_page() on the stack makes me suspect that we have hit
> that VM_FAULT_FALLBACK case and this is related to splitting. Adding a
> few more people explicitly to the cc in case anybody sees anything
> (original email on lkml and linux-mm for context, guys).
It's a familiar one, that Sasha first reported over a year ago:
see https://lkml.org/lkml/2013/3/29/103
Somewhere in that thread I suggest that it's due to the source THPage
being split, and a tail page freed, while copy is in progress; and
not a problem without DEBUG_PAGEALLOC, since the pmd_same check
will prevent a miscopy from being made visible.
It's not a v3.15 regression, and it's no worry without DEBUG_PAGEALLOC.
If it's becoming easier to trigger and thus interfering with trinity,
then I guess we shall have to do something about it. Kirill tried one
approach that didn't work out, and we have so far both felt reluctant
to make the code uglier just to satisfy DEBUG_PAGEALLOC.
Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/