Re: Back in Production mode Again...

James G. Stallings II (zap@onyx.tarpon.net)
Sat, 15 Nov 1997 14:00:19 -0600 (CST)


OK, Doug, I'll buy that- I've just come back up from that worst-case
scenario. However, I'm at a loss for where to get the RAM. I mean, whip me
for a dunce, but whats the diff? Is there a special place to get
Gold-plated military-grade ram? ;)

Seriously, I'd replace it if I knew a better place to get it than where I
got it.

Thanks in advance-
James

...
To iterate is human, to recurse, divine.
-- Robert Heller

On Sat, 15 Nov 1997, Doug Ledford wrote:

>
> On 15-Nov-97 Larry McVoy wrote:
> >: My question is this: what -is- a high load average? Over most of the
> >: period of a year of running this system as a production fileserver, The
> >: 'average' load average is probably near .25; so just what -are-
> >reasonable
> >: loads on this system? (disk subsystems as described; dual P-166s, not
> >: overclocked; 32MB of crappy ram (we're cursed with the SIG-11 and so we
> >: compile off of this machine) on an older Tyan II Tomcat.
>
> First, before I get into the load average stuff. REPLACE YOUR RAM! I
> cannot stress this loadly enough. If you get GCC sig11 errors and have
> known bad RAM, then don't stop compiling on your machine, fix your machine.
> The compiles are merely one symptom of this problem. Another is that heavy
> disk usage on that 2940 SCSI controller can (and eventually *WILL*) lead to
> disk corruption and loss of data. The BusMastering design of the card and
> the driver will sometimes even find memory errors that GCC misses (mainly
> when both the CPU and the card are trying to access RAM at the same time,
> which is a higher load than GCC places on RAM by itself). A good test to
> prove my point to you is this:
>
> cd /usr/src
> tar xzf linux-2.0.29.tar.gz
> mv linux linux.orig
> for i in 1 2 3 4 5 6 7 8 9 10
> do
> tar xzf linux-2.0.29.tar.gz
> diff -U 2 -rN linux.orig linux
> rm -fr linux
> done
>
>
> If that little script creates any output on your screen, then you've just
> seen disk corruption caused by this faulty RAM.
>
> >"load average" in Unix, not just Linux, is a misnomer. All it means is
> >that that is the number of processes waiting (sleeping) in the kernel.
> >On some systems (I think, I'm hazy here) only processes sleeping in disk
> >wait are counted; on others I think it is all sleeping processes.
>
> Esssentially, any process with a state of R or D (as reported by ps) are
> counted in this number. Of course, D usually indicates that the program is
> waiting on some sort of disk activity. So, as you can imagine, if you have
> a lot of programs accessing the disk at the same time, your load average can
> get quite high. There's nothing wrong with that. I know people that have
> maintained load averages as high as 180 for 24 hours or more without
> problems, it just means your computer is outrunning your disks. Myself,
> I've maintained load averages as high as 120 for extended periods without
> problem.
>
>
> ----------------------------------
> E-Mail: Doug Ledford <dledford@dialnet.net>
> Date: 15-Nov-97
> Time: 13:33:52
> ----------------------------------
>