Re: [BUG][s390x] mm: system crashed

From: Simon Jeons
Date: Tue Apr 16 2013 - 03:57:18 EST


Hi Heiko,
On 04/16/2013 03:50 PM, Heiko Carstens wrote:
On Mon, Apr 15, 2013 at 02:16:55PM +0800, Zhouping Liu wrote:
On 04/15/2013 01:56 PM, Heiko Carstens wrote:
On Sun, Apr 14, 2013 at 11:28:40PM -0400, Zhouping Liu wrote:
ï 16109.346170Â Call Trace:
ï 16109.346179Â (ï <0000000000100920>Â show_trace+0x128/0x12c)
ï 16109.346195Â ï <00000000001cd320>Â rcu_check_callbacks+0x458/0xccc
ï 16109.346209Â ï <0000000000140f2e>Â update_process_times+0x4a/0x74
ï 16109.346222Â ï <0000000000199452>Â tick_sched_handle.isra.12+0x5e/0x70
ï 16109.346235Â ï <00000000001995aa>Â tick_sched_timer+0x6a/0x98
ï 16109.346247Â ï <000000000015c1ea>Â __run_hrtimer+0x8e/0x200
ï 16109.346381Â ï <000000000015d1b2>Â hrtimer_interrupt+0x212/0x2b0
ï 16109.346385Â ï <00000000001040f6>Â clock_comparator_work+0x4a/0x54
ï 16109.346390Â ï <000000000010d658>Â do_extint+0x158/0x15c
ï 16109.346396Â ï <000000000062aa24>Â ext_skip+0x38/0x3c
ï 16109.346404Â ï <00000000001153c8>Â smp_yield_cpu+0x44/0x48
ï 16109.346412Â (ï <000003d10051aec0>Â 0x3d10051aec0)
ï 16109.346457Â ï <000000000024206a>Â __page_check_address+0x16a/0x170
ï 16109.346466Â ï <00000000002423a2>Â page_referenced_one+0x3e/0xa0
ï 16109.346501Â ï <000000000024427c>Â page_referenced+0x32c/0x41c
ï 16109.346510Â ï <000000000021b1dc>Â shrink_page_list+0x380/0xb9c
ï 16109.346521Â ï <000000000021c0a6>Â shrink_inactive_list+0x1c6/0x56c
ï 16109.346532Â ï <000000000021c69e>Â shrink_lruvec+0x252/0x56c
ï 16109.346542Â ï <000000000021ca44>Â shrink_zone+0x8c/0x1bc
ï 16109.346553Â ï <000000000021d080>Â balance_pgdat+0x50c/0x658
ï 16109.346564Â ï <000000000021d318>Â kswapd+0x14c/0x470
ï 16109.346576Â ï <0000000000158292>Â kthread+0xda/0xe4
ï 16109.346656Â ï <000000000062a5de>Â kernel_thread_starter+0x6/0xc
ï 16109.346682Â ï <000000000062a5d8>Â kernel_thread_starter+0x0/0xc
[-- MARK -- Fri Apr 12 06:15:00 2013]
ï 16289.386061Â INFO: rcu_sched self-detected stall on CPU { 0} (t=42010 jiffies
g=89766 c=89765 q=10627)
Did the system really crash or did you just see the rcu related warning(s)?
I just check it again, actually at first the system didn't really
crash, but the system is very slow in response.
and the reproducer process can't be killed, after I did some common
actions such as 'ls' 'vim' etc, the system
seemed to be really crashed, no any response.

also in the previous testing, I can remember that the system would
be no any response for a long time, just only
repeatedly print out the such above 'Call Trace' into console.
Ok, thanks.
Just a couple of more questions: did you see this also on other archs, or just
s390 (if you tried other platforms at all).

If you have some time, could you please repeat your test with the kernel
command line option " user_mode=home "?

What's the meaning of this command line? I can't find it in Documentation/kernel-parameters.txt/


As far as I can tell there was only one s390 patch merged that was
mmap related: 486c0a0bc80d370471b21662bf03f04fbb37cdc6 "s390/mm: Fix crst
upgrade of mmap with MAP_FIXED".
Even though I don't think it explains the bug you've seen it might be worth
to try to revert it.

And at last, can you share your kernel config?

Thanks,
Heiko

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/