Re: kernel BUG at mm/swap.c:134! - page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
From: Borislav Petkov
Date: Sat Apr 18 2015 - 18:16:44 EST
Forgot to CC lkml for archiving purposes, here's the whole thread in
one:
---
Hi guys,
so I'm running some intermediate state of linus/master + tip/master from
Thursday and probably I shouldn't be even taking such splat seriously
and wait until 4.1-rc1 has been done but let me report it just in case
so that it is out there, in case someone else sees it too.
I don't have a reproducer yet except the fact that it happened twice
already, the second time while watching the new Star Wars teaser on
youtube (current->comm is "AudioThread" probably from chrome, as shown
in the splat below).
And srsly, to VM_BUG_ON_PAGE() while I'm watching the new Star Wars
teaser - you must be kidding me people!
Anyway, just FYI, someone might have an idea...
So here's the state of what I was running:
---
commit 3963e69e59fa4e36ac164e8cd520811135d868d3
Merge: 34c9a0ffc75a 11664e41b11e
Author: Borislav Petkov <bp@xxxxxxx>
Date: Thu Apr 16 13:39:44 2015 +0200
Merge remote-tracking branch 'tip/master' into rc0+
commit 11664e41b11ed447f598424dd83ecf65400be5a1 (refs/remotes/tip/master)
Merge: 61a7fd4deb61 2df8406a439b
Author: Ingo Molnar <mingo@xxxxxxxxxx>
Date: Thu Apr 16 09:20:52 2015 +0200
Merge branch 'sched/urgent'
commit eea3a00264cf243a28e4331566ce67b86059339d
Merge: e7c82412433a e693d73c20ff
Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Date: Wed Apr 15 16:39:15 2015 -0700
Merge branch 'akpm' (patches from Andrew)
Merge second patchbomb from Andrew Morton:
---
and here's the splat:
---
[115258.861335] page:ffffea0010a15040 count:0 mapcount:1 mapping: (null) index:0x0
[115258.869511] flags: 0x8000000000008014(referenced|dirty|tail)
[115258.874159] page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
[115258.874177] ------------[ cut here ]------------
[115258.874179] kernel BUG at mm/swap.c:134!
[115258.874182] invalid opcode: 0000 [#1]
[115258.874183] PREEMPT
[115258.874184] SMP
[115258.874187] Modules linked in:
[115258.874189] nls_iso8859_15
[115258.874190] nls_cp437
[115258.874192] ipt_MASQUERADE
[115258.874193] nf_nat_masquerade_ipv4
[115258.874194] iptable_mangle
[115258.874195] iptable_nat
[115258.874196] nf_conntrack_ipv4
[115258.874198] nf_defrag_ipv4
[115258.874199] nf_nat_ipv4
[115258.874200] nf_nat
[115258.874201] nf_conntrack
[115258.874202] iptable_filter
[115258.874204] ip_tables
[115258.874205] x_tables
[115258.874206] tun
[115258.874207] sha256_ssse3
[115258.874209] sha256_generic
[115258.874211] binfmt_misc
[115258.874212] ipv6
[115258.874213] vfat
[115258.874214] fat
[115258.874215] fuse
[115258.874216] dm_crypt
[115258.874217] dm_mod
[115258.874219] kvm_amd
[115258.874243] kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd amd64_edac_mod edac_core fam15h_power k10temp amdkfd amd_iommu_v2 radeon drm_kms_helper ttm cfbfillrect cfbimgblt cfbcopyarea acpi_cpufreq
[115258.874248] CPU: 0 PID: 2904 Comm: AudioThread Not tainted 4.0.0+ #1
[115258.874250] Hardware name: To be filled by O.E.M. To be filled by O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
[115258.874252] task: ffff8803e8278000 ti: ffff8803f8a04000 task.ti: ffff8803f8a04000
[115258.874262] RIP: 0010:[<ffffffff8113fcb9>] [<ffffffff8113fcb9>] put_compound_page+0x3b9/0x480
[115258.874264] RSP: 0018:ffff8803f8a07b98 EFLAGS: 00010246
[115258.874266] RAX: 000000000000003d RBX: ffffea0010a15040 RCX: 0000000000000000
[115258.874268] RDX: ffffffff8109f016 RSI: ffffffff810bb33f RDI: ffffffff810bae60
[115258.874270] RBP: ffff8803f8a07bb8 R08: 0000000000000001 R09: 0000000000000001
[115258.874271] R10: 0000000000000001 R11: 0000000000000001 R12: ffffea0010a15000
[115258.874273] R13: ffff8803f8a07e28 R14: ffffea0010a15040 R15: 0000000000000000
[115258.874276] FS: 00007f206f2af700(0000) GS:ffff88042c600000(0000) knlGS:0000000000000000
[115258.874278] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[115258.874280] CR2: 00007f2095443310 CR3: 000000041e866000 CR4: 00000000000406f0
[115258.874281] Stack:
[115258.874287] ffff8803f8a07e60 ffffea0010a151c0 ffff8803f8a07e28 ffffea0010a15040
[115258.874292] ffff8803f8a07c28 ffffffff8113ffd0 0000000100000000 00000000ffffffff
[115258.874296] ffff8803f8a07de8 ffff8803f8a07e60 ffff8803f8a07be8 ffff8803f8a07be8
[115258.874298] Call Trace:
[115258.874304] [<ffffffff8113ffd0>] release_pages+0x250/0x270
[115258.874311] [<ffffffff811736c5>] free_pages_and_swap_cache+0x95/0xb0
[115258.874317] [<ffffffff8115ddc0>] tlb_flush_mmu_free+0x40/0x60
[115258.874323] [<ffffffff8115fcac>] unmap_single_vma+0x69c/0x730
[115258.874331] [<ffffffff81160594>] unmap_vmas+0x54/0xb0
[115258.874335] [<ffffffff81165a38>] unmap_region+0xa8/0x110
[115258.874342] [<ffffffff811679ea>] do_munmap+0x1ea/0x3f0
[115258.874346] [<ffffffff81167c33>] ? vm_munmap+0x43/0x80
[115258.874350] [<ffffffff81167c41>] vm_munmap+0x51/0x80
[115258.874354] [<ffffffff81168bee>] SyS_munmap+0xe/0x20
[115258.874359] [<ffffffff816918db>] system_call_fastpath+0x16/0x73
[115258.874424] Code: 81 48 89 df e8 29 c9 01 00 0f 0b 48 c7 c6 00 81 8a 81 4c 89 e7 e8 18 c9 01 00 0f 0b 48 c7 c6 d8 96 8b 81 48 89 df e8 07 c9 01 00 <0f> 0b 48 c7 c6 30 97 8b 81 48 89 df e8 f6 c8 01 00 0f 0b 48 c7
[115258.874428] RIP [<ffffffff8113fcb9>] put_compound_page+0x3b9/0x480
[115258.874429] RSP <ffff8803f8a07b98>
[115258.898487] ---[ end trace 6ec080e8a6ee9fb1 ]---
Thanks!
--
On Sat, Apr 18, 2015 at 4:56 PM, Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> so I'm running some intermediate state of linus/master + tip/master from
> Thursday and probably I shouldn't be even taking such splat seriously
> and wait until 4.1-rc1 has been done but let me report it just in case
> so that it is out there, in case someone else sees it too.
>
> I don't have a reproducer yet except the fact that it happened twice
> already, the second time while watching the new Star Wars teaser on
> youtube (current->comm is "AudioThread" probably from chrome, as shown
> in the splat below).
Hmm. The only recent commit in this area seems to be 822fc61367f0
("mm: don't call __page_cache_release for hugetlb") although I don't
see why it would cause anything like that. But it changes code that
has been stable for many years, which makes me wonder how valid it is
(__put_compound_page() has been unchanged since 2011, and now suddenly
it grew that "!PageHuge()" test).
So quite frankly, I'd almost suggest changing that
if (!PageHuge(page))
__page_cache_release(page);
back to the old unconditional __page_cache_release(page), and maybe add a single
WARN_ON_ONCE(PageHuge(page));
just to see if that condition actually happens. The new comment says
it shouldn't happen and that the change shouldn't matter, but...
Of course, your recent BUG_ON may well be entirely unrelated to this
change in mm/swap.c, but it *is* in kind of the same area, and the
timing would match too...
Linus
---
[115258.861335] page:ffffea0010a15040 count:0 mapcount:1 mapping:
(null) index:0x0
[115258.869511] flags: 0x8000000000008014(referenced|dirty|tail)
[115258.874159] page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
[115258.874179] kernel BUG at mm/swap.c:134!
[115258.874262] RIP: put_compound_page+0x3b9/0x480
---
On Sat, Apr 18, 2015 at 05:27:49PM -0400, Linus Torvalds wrote:
> On Sat, Apr 18, 2015 at 4:56 PM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> >
> > so I'm running some intermediate state of linus/master + tip/master from
> > Thursday and probably I shouldn't be even taking such splat seriously
> > and wait until 4.1-rc1 has been done but let me report it just in case
> > so that it is out there, in case someone else sees it too.
> >
> > I don't have a reproducer yet except the fact that it happened twice
> > already, the second time while watching the new Star Wars teaser on
> > youtube (current->comm is "AudioThread" probably from chrome, as shown
> > in the splat below).
I would guess it's related to sound: the most common source of PTE-mapeed
compund pages into userspace.
> Hmm. The only recent commit in this area seems to be 822fc61367f0
> ("mm: don't call __page_cache_release for hugetlb") although I don't
> see why it would cause anything like that. But it changes code that
> has been stable for many years, which makes me wonder how valid it is
> (__put_compound_page() has been unchanged since 2011, and now suddenly
> it grew that "!PageHuge()" test).
>
> So quite frankly, I'd almost suggest changing that
>
> if (!PageHuge(page))
> __page_cache_release(page);
>
> back to the old unconditional __page_cache_release(page), and maybe add a single
>
> WARN_ON_ONCE(PageHuge(page));
>
> just to see if that condition actually happens. The new comment says
> it shouldn't happen and that the change shouldn't matter, but...
>
> Of course, your recent BUG_ON may well be entirely unrelated to this
> change in mm/swap.c, but it *is* in kind of the same area, and the
> timing would match too...
Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
cause. I don't see why the commit could broke anything, but it worth
trying to revert and test.
Borislav, could you try?
> ---
> [115258.861335] page:ffffea0010a15040 count:0 mapcount:1 mapping:
> (null) index:0x0
> [115258.869511] flags: 0x8000000000008014(referenced|dirty|tail)
> [115258.874159] page dumped because: VM_BUG_ON_PAGE(page_mapcount(page) != 0)
> [115258.874179] kernel BUG at mm/swap.c:134!
> [115258.874262] RIP: put_compound_page+0x3b9/0x480
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>
--
Kirill A. Shutemov
---
On Sat, Apr 18, 2015 at 5:56 PM, Kirill A. Shutemov
<kirill@xxxxxxxxxxxxx> wrote:
>
> Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
> cause. I don't see why the commit could broke anything, but it worth
> trying to revert and test.
Ahh, yes, that does look like a more likely culprit.
Linus
---
On Sat, Apr 18, 2015 at 05:59:53PM -0400, Linus Torvalds wrote:
> On Sat, Apr 18, 2015 at 5:56 PM, Kirill A. Shutemov
> <kirill@xxxxxxxxxxxxx> wrote:
> >
> > Andrea has already seen the bug and pointed to 8d63d99a5dfb as possible
> > cause. I don't see why the commit could broke anything, but it worth
> > trying to revert and test.
>
> Ahh, yes, that does look like a more likely culprit.
Reverted and building... will report in the next days.
Thanks guys.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/