Re: Stability (2.2.14/15/16/17pre1)

From: Zdenek Kabelac (kabi@fi.muni.cz)
Date: Fri Jun 16 2000 - 04:57:43 EST


George Sexton wrote:
>
> The strange thing is that most of the time, the systems crash when there is
> little or no load. They run very well under a heavy load all day long, and
> then crash at night or on weekends.
>
> Do you have some patches that haven't made it into the mainstream kernel
> that might help things out? If everything has been incorporated, can you
> give me some ideas on where to look in the kernel source?
>
> I am about ready to pitch SMP boxes for Linux. It just isn't working right.
> There have been 4 posts to the list this week about SMP instability with
> 2.2.16 but they have essentially gone un-answered.
>
> Any help would be really appreciated.

Could you check few things:

At first create high energy load on your computer to see if the power
source is good:
This is the suggested way:

while : ; do hdparm -t /dev/hdXXX ; done
run this for every hdd you have in your computer.
while : ; do cat /dev/cdrom >/dev/null ; done

ping -f localhost twice for every CPU - to take spare CPU cycles.

Now leave the machine for couple minutes running - if it will survive 15
minutes
its probably OK.

Next thing - could you try to build kernel from this URL:
http://decibel.fi.muni.cz/kernel/linux-2.2.16com.tar.bz2
(Basicaly pure 2.2.16 with few other patches - AGP, IDE, RTL, & my small
deadlock
preventer) - all the patches are in the patches.tar.bz2

Kernel itself is kernel.tar.bz2 - however its build with my config
options,
so you probably want to build your own kernel anyway.

My machine is pretty stable (only one weird deadlock in last month) and
I'm
using RTLinux, so I could make strange mistake somewhere else possible.
(I'm receiving irq_enter message occassionaly while using NFS)
Also side note - when the computer frezes - wait for few minutes (like
3-4)
if its not stucked in TLB-wait - I think this happend to me once.

-- 
             There are three types of people in the world:
               those who can count, and those who can't.
  Zdenek Kabelac  http://i.am/kabi/ kabi@i.am {debian.org; fi.muni.cz}

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jun 23 2000 - 21:00:11 EST