Re: 2.2.13 Sparc SMP Spinlocks...

Anton Blanchard (anton@progsoc.uts.edu.au)
Thu, 4 Nov 1999 00:51:55 +1100



> The web server crashed, this is also a 4-CPU SS-10, equipped with Ross
> RTK-625 Hypersparc CPU's, 384MB of RAM, CGSIX video, no other additional
> hardware.
>
> This machine is running completely stock Linux 2.2.13, which is to say I
> have not applied ANY patches to this kernel at all.
>
> It was compiled with:
>
> gcc -v
> Reading specs from /usr/lib/gcc-lib/sparc-redhat-linux/egcs-2.90.29/specs
> gcc version egcs-2.90.29 980515 (egcs-1.0.3 release)

You probably should use a 1.1.2 egcs with the sparc fixes (I use the version
in the redhat 6.0 erratas).

> After running approximately 9 hours, it crashed with the following on the
> console:
>
> spin_lock_irqsave(eeb2066c) CPU#0 stuck at f00cb048, owner PC(00000000):CPU(1)
> spin_lock(f0153994) CPU#1 stuck at f002dea4, owner PC(f004a44c):CPU(0)
>
> Relevant addresses:
> First line:
>
> eeb2066c Nothing close in System.map:
...
> Perhaps this was something on the stack?
>
> f00cb0048 Between the following two:
>
> f00cb018 T rpc_clnt_sigunmask
> f00cb0ac T rpc_do_call
>
> ... would this mean something in function rpc_clnt_sigunmask()?

void rpc_clnt_sigunmask(struct rpc_clnt *clnt, sigset_t *oldset)
{
unsigned long irqflags;

spin_lock_irqsave(&current->sigmask_lock, irqflags);
current->blocked = *oldset;
recalc_sigpending(current);
spin_unlock_irqrestore(&current->sigmask_lock, irqflags);
}

If there is enough RAM then some of it gets mapped below 0xf0000000.
current->sigmask_lock ended up in this area of RAM. (0xeeb2066c)

> And then why is PC(00000000) all zeros?

This worries me. I haven't been able to test hypersparcs or large RAM
configurations while working on sparc32 SMP. Perhaps there is a problem
in one of them.

> Second line:
>
> f0153994 Between the following two:
>
> f0153908 D smp_proc_in_lock
> f0153988 D boot_cpu_id

I would think that f0153994 is kernel_flag

> f002dea4 Between the following two:
>
> f002de08 T patchme_store_new_current
> f002e150 T __wake_up
>
> f004a44c Between these two:
>
> f004a428 T sys_writev
> f004a560 T sys_pread

No problem here. It is the other lock that is causing the trouble.

Anton

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/