Re: 3.15-rc8 mm/filemap.c:202 BUG
From: Hugh Dickins
Date: Fri Jun 06 2014 - 19:08:10 EST
On Wed, 4 Jun 2014, Sasha Levin wrote:
> On 06/03/2014 07:11 PM, Hugh Dickins wrote:
> > On Tue, 3 Jun 2014, Konstantin Khlebnikov wrote:
> >> > On Tue, Jun 3, 2014 at 8:21 AM, Dave Jones <davej@xxxxxxxxxx> wrote:
> >>> > > I'm still seeing this one from time to time, though it takes me quite a while to hit it,
> >>> > > despite my attempts at trying to narrow down the set of syscalls that cause it.
> >>> > >
> >>> > > kernel BUG at mm/filemap.c:202!
> >>> > > invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> >>> > > CPU: 3 PID: 3013 Comm: trinity-c361 Not tainted 3.15.0-rc8+ #225
> >>> > > task: ffff88006c610000 ti: ffff880055960000 task.ti: ffff880055960000
> >>> > > RIP: 0010:[<ffffffffac158e28>] [<ffffffffac158e28>] __delete_from_page_cache+0x318/0x360
> >>> > > RSP: 0018:ffff880055963b90 EFLAGS: 00010046
> >>> > > RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffff880146f68388
> >>> > > RDX: 000000000000022a RSI: ffffffffaca8db38 RDI: ffffffffaca62b17
> >>> > > RBP: ffff880055963be0 R08: 0000000000000002 R09: ffff88000613d530
> >>> > > R10: ffff880055963ba8 R11: ffff880007f49a40 R12: ffffea0006795880
> >>> > > R13: ffff880143232ad0 R14: 0000000000000000 R15: ffff880143232ad8
> >>> > > FS: 00007f1e40673700(0000) GS:ffff88024d180000(0000) knlGS:0000000000000000
> >>> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> > > CR2: 00007f1e404e6000 CR3: 00000000603eb000 CR4: 00000000001407e0
> >>> > > DR0: 0000000001bb1000 DR1: 0000000002537000 DR2: 00000000016a5000
> >>> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> >>> > > Stack:
> >>> > > ffff880143232ae8 0000000000000000 ffff88000613d530 ffff88000613d568
> >>> > > 0000000008828259 ffffea0006795880 ffff880143232ae8 0000000000000000
> >>> > > 0000000000000002 0000000000000002 ffff880055963c08 ffffffffac158eae
> >>> > > Call Trace:
> >>> > > [<ffffffffac158eae>] delete_from_page_cache+0x3e/0x70
> >>> > > [<ffffffffac16921b>] truncate_inode_page+0x5b/0x90
> >>> > > [<ffffffffac174493>] shmem_undo_range+0x363/0x790
> >>> > > [<ffffffffac1748d4>] shmem_truncate_range+0x14/0x30
> >>> > > [<ffffffffac174bcf>] shmem_fallocate+0x9f/0x340
> >>> > > [<ffffffffac324d40>] ? timerqueue_add+0x60/0xb0
> >>> > > [<ffffffffac1c5ff6>] do_fallocate+0x116/0x1a0
> >>> > > [<ffffffffac182260>] SyS_madvise+0x3c0/0x870
> >>> > > [<ffffffffac346b33>] ? __this_cpu_preempt_check+0x13/0x20
> >>> > > [<ffffffffac74c41f>] tracesys+0xdd/0xe2
> >>> > > Code: ff ff 01 41 f6 c6 01 48 8b 45 c8 75 16 4c 89 30 e9 70 fe ff ff 66 0f 1f 44 00 00 0f 0b 66 0f 1f 44 00 00 0f 0b 66 0f 1f 44 00 00 <0f> 0b 66 0f 1f 44 00 00 41 54 9d e8 78 9e fd ff e9 8c fe ff ff
> >>> > > RIP [<ffffffffac158e28>] __delete_from_page_cache+0x318/0x360
> >>> > >
> >>> > > There was also another variant of the same BUG with a slighty different stack trace.
> >>> > >
> >>> > > kernel BUG at mm/filemap.c:202!
> >>> > > invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
> >>> > > CPU: 2 PID: 6928 Comm: trinity-c45 Not tainted 3.15.0-rc5+ #208
> >>> > > task: ffff88023669d0a0 ti: ffff880186146000 task.ti: ffff880186146000
> >>> > > RIP: 0010:[<ffffffff8415ba05>] [<ffffffff8415ba05>] __delete_from_page_cache+0x315/0x320
> >>> > > RSP: 0018:ffff880186147b18 EFLAGS: 00010046
> >>> > > RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000002
> >>> > > RDX: 000000000000012a RSI: ffffffff84a9a83c RDI: ffffffff84a6e0c0
> >>> > > RBP: ffff880186147b68 R08: 0000000000000002 R09: ffff88002669e668
> >>> > > R10: ffff880186147b30 R11: 0000000000000000 R12: ffffea0008b067c0
> >>> > > R13: ffff880025355670 R14: 0000000000000000 R15: ffff880025355678
> >>> > > FS: 00007fc10026f740(0000) GS:ffff880244400000(0000) knlGS:0000000000000000
> >>> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>> > > CR2: 00002ab350f5c004 CR3: 000000018566c000 CR4: 00000000001407e0
> >>> > > DR0: 0000000001989000 DR1: 0000000000944000 DR2: 0000000002494000
> >>> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
> >>> > > Stack:
> >>> > > ffff880025355688 ffff8800253556a0 ffff88002669e668 ffff88002669e6a0
> >>> > > 000000008ea099ef ffffea0008b067c0 ffff880025355688 0000000000000000
> >>> > > 0000000000000000 0000000000000002 ffff880186147b90 ffffffff8415ba4d
> >>> > > Call Trace:
> >>> > > [<ffffffff8415ba4d>] delete_from_page_cache+0x3d/0x70
> >>> > > [<ffffffff8416b0ab>] truncate_inode_page+0x5b/0x90
> >>> > > [<ffffffff84175f0b>] shmem_undo_range+0x30b/0x780
> >>> > > [<ffffffff84176394>] shmem_truncate_range+0x14/0x30
> >>> > > [<ffffffff8417647d>] shmem_evict_inode+0xcd/0x150
> >>> > > [<ffffffff841e4b17>] evict+0xa7/0x170
> >>> > > [<ffffffff841e5435>] iput+0xf5/0x180
> >>> > > [<ffffffff841df8a0>] dentry_kill+0x260/0x2d0
> >>> > > [<ffffffff841df97c>] dput+0x6c/0x110
> >>> > > [<ffffffff841c92a9>] __fput+0x189/0x200
> >>> > > [<ffffffff841c936e>] ____fput+0xe/0x10
> >>> > > [<ffffffff84090484>] task_work_run+0xb4/0xe0
> >>> > > [<ffffffff8406ee42>] do_exit+0x302/0xb80
> >>> > > [<ffffffff84349e13>] ? __this_cpu_preempt_check+0x13/0x20
> >>> > > [<ffffffff8407073c>] do_group_exit+0x4c/0xc0
> >>> > > [<ffffffff840707c4>] SyS_exit_group+0x14/0x20
> >>> > > [<ffffffff8475bf64>] tracesys+0xdd/0xe2
> >>> > > Code: 4c 89 30 e9 80 fe ff ff 48 8b 75 c0 4c 89 ff e8 82 8f 1c 00 84 c0 0f 85 6c fe ff ff e9 4f fe ff ff 0f 1f 44 00 00 e8 ae 95 5e 00 <0f> 0b e8 04 1c f1 ff 0f 0b 66 90 0f 1f 44 00 00 55 48 89 e5 41
> >>> > >
> >>> > >
> >>> > > --
> >>> > > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>> > > the body to majordomo@xxxxxxxxxx For more info on Linux MM,
> >>> > > see: http://www.linux-mm.org/ .
> >>> > > Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>
> >> >
> >> > This might shine some light, CONFIG_DEBUG_VM should be =y.
> >> >
> >> > --- a/mm/filemap.c
> >> > +++ b/mm/filemap.c
> >> > @@ -199,7 +199,7 @@ void __delete_from_page_cache(struct page *page,
> >> > void *shadow)
> >> > __dec_zone_page_state(page, NR_FILE_PAGES);
> >> > if (PageSwapBacked(page))
> >> > __dec_zone_page_state(page, NR_SHMEM);
> >> > - BUG_ON(page_mapped(page));
> >> > + VM_BUG_ON_PAGE(page_mapped(page), page);
> >> >
> >> > /*
> >> > * Some filesystems seem to re-dirty the page even after
> > Yes, there's a chance that will tell us more (but I don't have high
> > hopes of it). I'm still stumped by this issue, just as before.
> >
> > Sasha (or Dave), any update on whether you see this without THP?
> > and whether you see the remove_migration_pte oops without THP?
>
> I'm pretty sure at this point that I only see both with THP enabled.
Thanks for the hint, though I've made nothing of it. Of course, some
problems come from THP itself, and some from its pressure for migration.
Though I'd wanted to see the remove_migration_pte oops as a key to the
page_mapped bug, my guess is that they're actually independent.
I might have a potential fix for the remove_migration_pte one, I just
want to go back and look at my logic again, will reply in that thread
if I'm still convinced.
But a couple of hours ago had a thought on the page_mapped() bug,
and would like to propose a patch which could be the answer to that -
though frankly I remain pessimistic. See below.
>
> I've started seeing much less of them during fuzzing. Timing changes?
Strange; I've no idea. Anyway, here's today's thought...
[PATCH] mm: entry = ACCESS_ONCE(*pte) in handle_pte_fault
Use ACCESS_ONCE() in handle_pte_fault() when getting the entry or orig_pte
upon which all subsequent decisions and pte_same() tests will be made.
I have no evidence that its lack is responsible for the mm/filemap.c:202
BUG_ON(page_mapped(page)) in __delete_from_page_cache() found by trinity,
and I am not optimistic that it will fix it. But I have found no other
explanation, and ACCESS_ONCE() here will surely not hurt.
If gcc does re-access the pte before passing it down, then that would be
disastrous for correct page fault handling, and certainly could explain
the page_mapped() BUGs seen (concurrent fault causing page to be mapped
in a second time on top of itself: mapcount 2 for a single pte).
Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx>
---
mm/memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- 3.15-rc8/mm/memory.c 2014-04-27 23:55:53.608801152 -0700
+++ linux/mm/memory.c 2014-06-06 15:02:35.044320183 -0700
@@ -3810,7 +3810,7 @@ static int handle_pte_fault(struct mm_st
pte_t entry;
spinlock_t *ptl;
- entry = *pte;
+ entry = ACCESS_ONCE(*pte);
if (!pte_present(entry)) {
if (pte_none(entry)) {
if (vma->vm_ops) {
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/