2.6.17 x86_64 regression - reboot fails due to deadlock
From: Mr. Berkley Shands
Date: Thu Jul 06 2006 - 12:16:04 EST
With a SuperMicro H8DC8 (nvidia chipset), Dual Opteron 285's, 16GB,
Centos 4.3 -
Under 2.6.16 both the tyan 2895 and the supermicro H8DC8 both will
reboot corectly,
in kernel/sys.c machine_restart() gets called. But with the changes to
sys.c under 2.6.17,
a new path is introduced, calling void kernel_restart_prepare(char *cmd)
which calls blocking_notifier_call_chain(&reboot_notifier_list,
SYS_RESTART, cmd); (line 588)
Which looks at the first element of the notifier list, and blocks
forever. But ONLY on the supermicro.
The tyan, a very similar motherboard does not deadlock. It returns and
still calls machine_restart().
So neither reboot nor "shutdown -fh now" actually get to the bios calls.
on the supermicro, (linux-2.6.17/kernel/sys.c)
static int __kprobes notifier_call_chain(struct notifier_block **nl,
unsigned long val, void *v)
{
int ret = NOTIFY_DONE;
struct notifier_block *nb;
nb = rcu_dereference(*nl);
while (nb) {
ret = nb->notifier_call(nb, val, v); /* this is
the deadlock for the first entry */
if ((ret & NOTIFY_STOP_MASK) == NOTIFY_STOP_MASK)
break;
nb = rcu_dereference(nb->next);
}
return ret;
}
I see that 2.6.18 reworks this code further.
If I want to hurt myself really, really badly, disabling the call to
blocking_notifier_call_chain(&reboot_notifier_list,...
restores the reboot/power off functions.
In kdb, the system sits idle awaiting something to schedule, but nothing
will schedule since there is
a deadlock on the supermicro. Any clues as to how to find which notifier
is deadlocked?
berkley
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/