Re: spinlock on Alpha ES40

From: Peter Rival (frival@zk3.dec.com)
Date: Thu Jun 22 2000 - 06:31:43 EST


Andrew Pochinsky wrote:

> Hi,
>
> I'm having strange problem with running SMP-enabled kernels on
> Compaq's ES40. The problem appears on all kernels from 2.2.12 to
> 2.2.16 and seems to manifest itself when the system is reasonably
> loaded. Here is the problem's manifistation:
>
> From time to time, there is a burst of messages both on the console
> and in the /var/log/messages which look like this:
>
> Jun 21 15:58:56 es40-001 kernel: fault.c:43 spinlock stuck in main2.x at fffffc000032a360(3) owner main4.x at fffffc000032a360(1) fault.c:43
> Jun 21 15:58:56 es40-001 kernel: fault.c:43 spinlock grabbed in main2.x at fffffc000032a360(3) 1202 ticks

These messages are normal (not necessarily _good_, but normal). Chances are that you're running into the kernel_lock. We had a decent
discussion about this on the axp-list. The "stuck" message just means that the spin_lock() routine (really debug_spin_lock()) thinks that it
has waited too long to get the particular lock. The "grabbed" message tells you that it finally got it. In this line, 1202 ticks means that
the lock was held for just over 1 second (1024 ticks/sec on Alpha) - not a massive amount of time in Linux, but considered sinful in other
operating systems.

Without getting into a flame-fest about just how much locking is enough versus how much is too much (way too much intersection on those
points...) let's just say that this particular situation is _much_ better in the 2.3/2.4 series. I've got an ES40 with 60+ disks and 4 GB of
memory and I don't see messages like this too much until I _really_ load the system with the latest kernels.

<snip>

> Sometimes, the machine goes completely catatonic and should be
> resetted. Less often the system really crashes. My estimate is that
> this lockup happens once in about a fortnight; out of 10 machines we
> are running there is somewhat less that one failure per day.
>

_This_ is not good. Do these hangs have a "stuck" line with no "grabbed" line afterwards? Other than some serious problems with the QLogic
driver (and apparently only when attached to a RAID array...hrmmm....) I haven't been able to take my system down with the latest 2.3/2.4
kernels - haven't really tried 2.2 since some of the 2.2.14pre series.

>
> Details about machines: 4 cpus 21264 at 666MHz, 1GB memory, 9GB scsi
> disk on sym53c895, 2GB swap space. The problem seems only to exist
> when smp is enabled in the kernel. spinlock gets stuck in various
> places, including fault.c, sched.c, open.c, read_write.c etc.
>
> Had anyone have the same problem, or has any suggestions?
>

If you can, try the latest 2.3/2.4 kernels. They're much more scalable and at least for me very stable. BTW, what are you running on these
systems? I'm just curious if there's a load I can put on my system to replicate what you're doing... :)

 - Pete

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jun 23 2000 - 21:00:23 EST