Re: Hard lockup with 2.0.34pre15

Ondrej Feela Filip (feela@ipex.cz)
Tue, 19 May 1998 19:46:05 +0200 (MET DST)


-----BEGIN PGP SIGNED MESSAGE-----

uOn Tue, 19 May 1998, Doug Ledford wrote:

> Both of these descriptions are reminiscent of two separate items. First, in
> the RedHat-5.0 distribution there is a 16 byte memory scribble in the
> aic7xxx driver that ships by default (as well as in the aic7xxx driver in
> 2.0.33). Theoretically, this shouldn't cause disc corruption on static
> files though because even if we memory scribbled, we wouldn't be writing it
> back out. The memory scribble in question was specifically only applicable
> to in-memory copies of programs or code.
>
> Secondly, if this is happening with the later drivers, such as it has
> actively happened with the aic7xxx driver in 2.0.34pre15, then I'm very
> suspicious of hardware. To put it bluntly, the newest aic7xxx driver
> increases the DMA load on your system for the same given number of
> commands. This increased DMA load can be enough to cause hardware glitches
> to show up in marginal systems. I'm not guessing that this is the case, I
> have a machine here that I can prove it with. So, a simple test to see if
> this happens:

Linux hangs with 2.0.33 (orig, I do not use kernel from RedHat) and with
2.0.34pre15.

>
> get the source for linux-2.0.33.tar.gz from ftp.kernel.org and put it in the
> /usr/src directory. Save your current linux source tree to another
> directory name (such as linux.real). Then run this script:
>
>
> #!/bin/sh
>
> cd /usr/src
> tar xzf linux-2.0.33.tar.gz
> mv linux linux.orig
> while true
> do
> tar xzf linux-2.0.33.tar.gz
> diff -U 3 -rN linux.orig linux
> rm -fr linux
> done
>

I'm sorry I cannot try this, because the machine is not near. :-) But I
tried this:

compilling kernel
find /
rpm -Va

in one time and this hangs machine. I can send you oops. It start with
saying:
May 18 12:33:17 lenka kernel: attempt to access beyond end of device
May 18 12:33:17 lenka kernel: 08:06: rw=0, want=134524894, limit=313236
....

>
> ----------End script------------
>
> That script will run forever until killed. In general, you should never see
> any output from that script. If you do, you're getting hit by a hardware
> glitch and need to track down the source (either bad RAM, bad cache, bad
> CPU, whatever). If you see output from this script, then try changing
> various BIOS timing items for RAM and cache or disabling cache until it goes
> away. When it goes away, your machine should then be stable. Don't forget
> to check things like CPU fans and PCI options as well if you have any errors
> from this script.
>

Hmm, I'll try it. But now I'm 150km away.

> Let me know what you find out, because if there is a bug in regards to some
> sort of scribble, I definitely want to get it found. However, do me a favor
> and test against the aic7xxx-5.0.15 driver. There is a patch for the
> 2.0.34pre15 kernel to update it to the 5.0.15 driver at
> ftp://ftp.dialnet.net/pub/linux/aic7xxx/2.0.34pre15/aic7xxx-5.0.15-34pre15.patch.gz
>

OK, this I can try without console access. I'll report.

- --
Ondrej Feela Filip
E-mail: feela@ipex.cz
WWW: http://feela.ipex.cz
PGP: finger feela@atrey.karlin.mff.cuni.cz

-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQCVAwUBNWHFYDOI+O1xZZZNAQES9wQAinozdTDcXmmN449nSsJVP+BgPxmW95/1
zCk2Ddp5bepbY0X8nEWzNkTCt7zES0tohkwUqCLlP/y5sEhTne+1iOsXsE9Y5GSy
r1k1vKiCfRcWh08pxAXDUQkdoDyJ5I9o9dTzGodb23p2MqKUw1QaB+ssPosfH/Sz
pS2HtkxF5ko=
=trmK
-----END PGP SIGNATURE-----

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu