Re: [BUG] [ tip/sched/core ] System unresponsive after booting
From: Daniel Lezcano
Date: Thu Jan 16 2014 - 08:49:02 EST
On 01/15/2014 01:04 PM, Peter Zijlstra wrote:
On Wed, Jan 15, 2014 at 09:27:34AM +0100, Daniel Lezcano wrote:
Hi all,
I use the tip/sched/core branch.
After git pulling yesterday, my host is unresponsive after booting the OS.
* It boots normally
* It sends info to the console
* The graphics does not work
* The terminals show the prompt, I can enter the username but after
pressing enter, it does not give the password prompt
* sysrq works more or less, I can't get the process stack but it receives
the command
It is like no new process can be created.
I have a dual Xeon processor E5325 (2 x 4 cores).
After git bisecting, the following patch seems to introduce the bug.
commit d50dde5a10f305253cbc3855307f608f8a3c5f73
OK, so my headless WSM-EP boots just fine. Obviously it cannot confirm
if graphics works, but I can ssh in and work on it without bother.
I can even log in on the serial console without problems.
I tried both tip/master and tip/sched/core.
Would you happen to have a .config for me to try?
I was able to reduce the scope and reproduce the issue.
AFAICT, that happens with rsyslogd. When login in a tty, the login
command sends a message through /dev/log. But rsyslogd is never woken up
and blocked in poll_schedule_timeout. The login process is blocked in
unix_wait_for_peer.
I can strace rsyslogd at startup. The two last sched_setscheduler calls
fail.
> grep sched trace.out
3570 sched_getparam(3570, { 0 }) = 0
3570 sched_getscheduler(3570) = 0 (SCHED_OTHER)
3570 sched_get_priority_min(SCHED_OTHER) = 0
3570 sched_get_priority_max(SCHED_OTHER) = 0
3571 sched_get_priority_min(SCHED_OTHER) = 0
3571 sched_get_priority_max(SCHED_OTHER) = 0
3571 sched_get_priority_min(SCHED_OTHER) = 0
3571 sched_get_priority_max(SCHED_OTHER) = 0
3571 sched_setscheduler(3572, SCHED_OTHER, { 0 } <unfinished ...>
3571 <... sched_setscheduler resumed> ) = 0
3571 sched_get_priority_min(SCHED_OTHER <unfinished ...>
3571 <... sched_get_priority_min resumed> ) = 0
3571 sched_get_priority_max(SCHED_OTHER <unfinished ...>
3571 <... sched_get_priority_max resumed> ) = 0
3571 sched_setscheduler(3573, SCHED_OTHER, { 0 } <unfinished ...>
3571 <... sched_setscheduler resumed> ) = -1 EPERM (Operation not
permitted)
3571 sched_get_priority_min(SCHED_OTHER <unfinished ...>
3571 <... sched_get_priority_min resumed> ) = 0
3571 sched_get_priority_max(SCHED_OTHER <unfinished ...>
3571 <... sched_get_priority_max resumed> ) = 0
3571 sched_setscheduler(3574, SCHED_OTHER, { 0 } <unfinished ...>
3571 <... sched_setscheduler resumed> ) = -1 EPERM (Operation not
permitted)
The same strace but on a kernel which does not hang. The calls to
sched_setscheduler do not fail.
3292 sched_getparam(3292, { 0 }) = 0
3292 sched_getscheduler(3292) = 0 (SCHED_OTHER)
3292 sched_get_priority_min(SCHED_OTHER) = 0
3292 sched_get_priority_max(SCHED_OTHER) = 0
3293 sched_get_priority_min(SCHED_OTHER) = 0
3293 sched_get_priority_max(SCHED_OTHER) = 0
3293 sched_get_priority_min(SCHED_OTHER) = 0
3293 sched_get_priority_max(SCHED_OTHER) = 0
3293 sched_setscheduler(3294, SCHED_OTHER, { 0 } <unfinished ...>
3293 <... sched_setscheduler resumed> ) = 0
3293 sched_get_priority_min(SCHED_OTHER <unfinished ...>
3293 <... sched_get_priority_min resumed> ) = 0
3293 sched_get_priority_max(SCHED_OTHER <unfinished ...>
3293 <... sched_get_priority_max resumed> ) = 0
3293 sched_setscheduler(3295, SCHED_OTHER, { 0 } <unfinished ...>
3293 <... sched_setscheduler resumed> ) = 0
3293 sched_get_priority_min(SCHED_OTHER <unfinished ...>
3293 <... sched_get_priority_min resumed> ) = 0
3293 sched_get_priority_max(SCHED_OTHER <unfinished ...>
3293 <... sched_get_priority_max resumed> ) = 0
3293 sched_setscheduler(3296, SCHED_OTHER, { 0 } <unfinished ...>
3293 <... sched_setscheduler resumed> ) = 0
The EPERM error comes from kernel/sched/core.c:3303
...
if (fair_policy(policy)) {
if (!can_nice(p, attr->sched_nice))
return -EPERM;
}
...
But I don't know why this is leading to block a process or making
rsyslogd being not woken up by a packet coming in the af_unix socket.
I hope that helps
-- Daniel
--
<http://www.linaro.org/> Linaro.org â Open source software for ARM SoCs
Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/