> The meantime between crashes with 2.2.13 on the SS-10 configured as above
> running Apache 1.3.3p3 was about four hours, with 2.2.14 it's about a day and a
> half. Here is the crash data for the 2.2.14 crash, on the console we get:
>
> spin_lock_irqsave(f808e66c) CPU #2 stuck at f00372c, owner PC(f003672c):CPU(2)
> spin_lock(f014b9ac) CPU #0 stuck at f003bde0, owner PC(f0036538):CPU(2)
> spin_lock(f014b9ac) CPU #1 stuck at f003bde0, owner PC(f0036538):CPU(2)
> spin_lock(f014b9ac) CPU #3 stuck at f003bde0, owner PC(f0036538):CPU(2)
CPU #0, #1 and #3 are stuck because CPU #2 grabbed the kernel_flag in
do_exit so we can ignore them.
The interesting line is:
spin_lock_irqsave(f808e66c) CPU #2 stuck at f00372c, owner PC(f003672c):CPU(2)
(I assume f00372c should be f003672c and is just written down incorrectly).
>From a quick guess, f808e66c points to a sigmask_lock and CPU #2 is stuck
in __exit_sighand (called from do_exit):
static inline void __exit_sighand(struct task_struct *tsk)
{
struct signal_struct * sig = tsk->sig;
unsigned long flags;
spin_lock_irqsave(&tsk->sigmask_lock, flags);
if (sig) {
tsk->sig = NULL;
if (atomic_dec_and_test(&sig->count))
kfree(sig);
}
flush_signals(tsk);
spin_unlock_irqrestore(&tsk->sigmask_lock, flags);
}
The debug line says that CPU #2 has already obtained the sigmask_lock and is
trying to again and in the same spot! Sounds suspicious, I'll give the
sparc spinlock code a once over.
Anton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sat Jan 15 2000 - 21:00:13 EST