Re: 3.1-rc9: BUG: soft lockup in find_get_pages+0x4e/0x140

From: Hugh Dickins
Date: Tue Oct 18 2011 - 00:10:13 EST


On Sat, 15 Oct 2011, Justin Piszcz wrote:
>
> With 3.1-rc9 during a filesystem dump, this occurred, I thought a previous
> patch fixed this but it did not, it occurs MUCH less (first time in a couple
> of weeks) but the problem still occurs.

Shaohua's nr_skip patch fixed, or at least worked around, the error I
introduced there in 3.1-rc1. But now you're finding a similar problem
still in 3.1-rc9: that confirms what you and others already reported,
that something like it occurs even in 3.0 without my bug.

>
> Thoughts?

No more than before: that it seems as if a page with refcount 0 has
somehow got stuck in the radix_tree. We do know of a refcounting bug
in THP, but I don't see how that one would manifest in this way.

Or maybe the radix_tree is corrupt and it isn't even a struct page *
there; but you're not the only one to see this, so random corruption
is not at all likely.

>
> [675100.763357] BUG: soft lockup - CPU#0 stuck for 22s! [dump:11066]
> [675100.763361] CPU 0 [675100.763364] Pid: 11066, comm: dump Not tainted
> 3.1.0-rc9 #1 Supermicro X8DTH-i/6/iF/6F/X8DTH
> [675100.763368] RIP: 0010:[<ffffffff81078eee>] [<ffffffff81078eee>]
> find_get_pages+0x4e/0x140
> [675100.763375] RSP: 0018:ffff8806e6603d28 EFLAGS: 00000246
> [675100.763377] RAX: ffff880019a0a210 RBX: ffffea0028a65820 RCX:
> 000000000000000e
> [675100.763379] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> ffffea0029e0faf0
> [675100.763381] RBP: 0000000001200ae9 R08: 0000000000000000 R09:
> ffff8806e6603ce8
> [675100.763383] R10: 0000000001200b03 R11: 0000000000000002 R12:
> ffffffff810805a4
> [675100.763386] R13: ffff880c3fffae00 R14: ffff880c3fffae00 R15:
> ffff8806e6603db8
> [675100.763392] FS: 00007fe9b56cc760(0000) GS:ffff88063fc00000(0000)
> knlGS:0000000000000000
> [675100.763395] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [675100.763397] CR2: 00007f2da3718fa8 CR3: 0000000c22da4000 CR4:
> 00000000000006f0
> [675100.763400] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [675100.763402] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
> 0000000000000400
> [675100.763405] Process dump (pid: 11066, threadinfo ffff8806e6602000, task
> ffff880c21c931a0)
> [675100.763407] Stack:
> [675100.763408] ffff8800398aaa88 0000000000000000 ffff88063fc0d560
> 0000000000000005
> [675100.763415] ffff880c270652d8 ffff8806e6603da8 00000000000067b5
> ffffffffffffffff
> [675100.763421] ffff880c270652d8 ffffea0028a65820 000000000000000d
> ffffffff81083d15
> [675100.763427] Call Trace:
> [675100.763433] [<ffffffff81083d15>] ? pagevec_lookup+0x15/0x20
> [675100.763437] [<ffffffff810843c6>] ? invalidate_mapping_pages+0x66/0x170
> [675100.763442] [<ffffffff812bc841>] ? blkdev_ioctl+0x6b1/0x700
> [675100.763451] [<ffffffff810e405b>] ? block_ioctl+0x3b/0x50
> [675100.763455] [<ffffffff810c692e>] ? do_vfs_ioctl+0x8e/0x4f0
> [675100.763459] [<ffffffff810c6dd9>] ? sys_ioctl+0x49/0x80
> [675100.763464] [<ffffffff8168b17b>] ? system_call_fastpath+0x16/0x1b
> [675100.763466] Code: 48 89 de 4c 89 ef e8 72 37 25 00 85 c0 89 c1 0f 84 a8
> 00 00 00 49 89 df 31 d2 31 f6 45 31 f6 66 0f 1f 44 00 00 49 8b 07 48 8b 38
> [675100.763484] 85 ff 74 3e 40 f6 c7 03 75 5e 44 8b 47 1c 45 85 c0 74 ec 45

Since I've no idea what's causing this, all we can do is attempt to
gather more info. The patch below, which applies to 3.0.N as well as
3.1-rc9, may help with that; but it makes no attempt to recover the
situation, since I just don't know what this situation is.

When you next hit the problem, please let us know the messages say,
from the "find_get_pages(" line to the end of the trace, but the
Page lines the most interesting - thanks, and good luck.

The memcontrol.c part of it is to cut out the noise of an interfering
bug and stacktrace, coming from an irrelevant error in some configs.

Signed-off-for-debug-only-by: Hugh Dickins <hughd@xxxxxxxxxx>
---

mm/filemap.c | 38 ++++++++++++++++++++++++++++++++++++++
mm/memcontrol.c | 17 +----------------
2 files changed, 39 insertions(+), 16 deletions(-)

--- 3.1-rc9/mm/filemap.c 2011-09-21 17:44:52.246176260 -0700
+++ fgpdebug/mm/filemap.c 2011-10-17 20:47:20.777176328 -0700
@@ -806,6 +806,29 @@ repeat:
}
EXPORT_SYMBOL(find_or_create_page);

+static void safe_dump_page(struct page *page, char *which)
+{
+ pg_data_t *pgdat;
+ bool good = false;
+ unsigned long pfn = 0;
+
+ for_each_online_pgdat(pgdat) {
+ struct page *start_page;
+
+ start_page = pfn_to_page(pgdat->node_start_pfn);
+ pfn = pgdat->node_start_pfn + (page - start_page);
+ if (page >= start_page &&
+ page < start_page + pgdat->node_spanned_pages &&
+ pfn_valid(pfn) && page == pfn_to_page(pfn)) {
+ good = true;
+ break;
+ }
+ }
+ printk(KERN_ALERT "Page %s is %p (pfn %lx)\n", which, page, pfn);
+ if (good)
+ dump_page(page);
+}
+
/**
* find_get_pages - gang pagecache lookup
* @mapping: The address_space to search
@@ -825,6 +848,7 @@ EXPORT_SYMBOL(find_or_create_page);
unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
unsigned int nr_pages, struct page **pages)
{
+ int spinning = 1000;
unsigned int i;
unsigned int ret;
unsigned int nr_found, nr_skip;
@@ -839,6 +863,20 @@ restart:
struct page *page;
repeat:
page = radix_tree_deref_slot((void **)pages[i]);
+ if (spinning > 0 && unlikely(--spinning == 0)) {
+ printk(KERN_ALERT
+ "find_get_pages(%p, %lu, %u, %p)",
+ mapping, start, nr_pages, pages);
+ print_symbol(KERN_CONT " of %s:\n",
+ (unsigned long)mapping->a_ops);
+ if (ret)
+ safe_dump_page(pages[ret-1], "before");
+ safe_dump_page(page, "stuck ");
+ if (i + 1 < nr_found)
+ safe_dump_page(radix_tree_deref_slot(
+ (void **)pages[i+1]), "after ");
+ dump_stack();
+ }
if (unlikely(!page))
continue;

--- 3.1-rc9/mm/memcontrol.c 2011-09-21 17:44:52.250176292 -0700
+++ fgpdebug/mm/memcontrol.c 2011-10-17 19:28:47.427572665 -0700
@@ -3380,23 +3380,8 @@ void mem_cgroup_print_bad_page(struct pa

pc = lookup_page_cgroup_used(page);
if (pc) {
- int ret = -1;
- char *path;
-
- printk(KERN_ALERT "pc:%p pc->flags:%lx pc->mem_cgroup:%p",
+ printk(KERN_ALERT "pc:%p pc->flags:%lx pc->mem_cgroup:%p\n",
pc, pc->flags, pc->mem_cgroup);
-
- path = kmalloc(PATH_MAX, GFP_KERNEL);
- if (path) {
- rcu_read_lock();
- ret = cgroup_path(pc->mem_cgroup->css.cgroup,
- path, PATH_MAX);
- rcu_read_unlock();
- }
-
- printk(KERN_CONT "(%s)\n",
- (ret < 0) ? "cannot get the path" : path);
- kfree(path);
}
}
#endif
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/