Re: Ok, I give up...

Wakko Warner (wakko@animx.eu.org)
Sun, 19 Dec 1999 18:38:40 -0500


I'm going to throw in another. It might be domain validations that fail. I
got one that didn't succeed. Took down my entire system (It is SMP with an
aha-2940u). Seems like every time I insert a CD in the drive and try to use
it another domain validation occures (2.2.13) I don't believe this ever
happened with 2.2.7

> Just some near random thoughts that might help you here...
>
> Are you logging the syslog on remote machines (@hostname entries in the
> syslogd.conf)? I've found in the past that that helps against the loss of
> log file tails.
>
> Are the boxes SMP, and are the disks maybe in (IDE) PIO mode sometimes? If
> that is possible, then you should try the 14pre15, because Alan has
> recently fixed a lockup bug for that case.
>
> Did you use kernels with minimal hardware support compiled-in (Just
> the minimum what you need, from the hardware that is present, and without
> using kernel modules and kerneld)?
>
> Did you try shuffling around with the hardware, different network cards,
> different mainboard/CPU combinations, other/new DIMMs, different power
> supplies (hooked up to different power groups, different/newer UPS-es)),
> to see when the problem gets worse or less (might help find the problem).
> Even though it used to work perfectly, the PC hardware could have become
> flakey since you ran the 2.0.x kernels (CPU fans are not the only things
> that can break down), and the power net supply and/or UPS (110V/220V) can
> have become worse, which can also have an influence on system stability.
>
> Also, are you running X11 on the servers (if so, I suggest you stop doing
> that, the XFree86 binaries have direct access to all hardware, so a bug in
> that binary can wreak havoc, and in addition VGA cards can lockup the PC
> hardware in some cases)? Id even suggest running the PCs without video
> cards (dont forget to compile a new kernel without console support, or you
> will be left with a significant slowdown of your server).

> > About 2 months ago, I moved all of my servers ( 15 of them) to the 2.2.x
> > kernels. Some were clean installs of RH 6.0, some were upgrades to RH
> > 5.2. But all of these servers ran 2.0.35 and 2.0.36 with 100+ days of
> > uptime and were rock solid. But since moving to the 2.2 kernels on the
> > same hardware, reliability and uptime sucks. Seems like I can rarely get a
> > month of uptime with the 2.2 kernels, and I've tried everything from 2.2.5
> > to 2.2.14pre13. The few oopses I've had have been traced back to buggy
> > hardware that has since been replaced. But in most every case with the 2.2
> > kernels, the servers (mainly serving web pages) run for a few days to a
> > week and then lock up completely. Then it requires a power cycle to bring
> > it back to life.
> >
> > So I'm stumped. When a system locks up I don't get any console messages or
> > anything in syslog. Since all of these servers ran 2.0.x kernels and were
> > very stable for months, I'm inclined to think the hardware is fine. So any
> > clues as to how I might track this down? I'd hate to say it (flame jacket
> > on) but my NT and freeBSD boxes are putting Linux to shame in terms of
> > stability....
> >
> > David
> >
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/