Re: Massive e2fs corruption with 2.2.9/10?

Chris Adams (cadams@ro.com)
Thu, 17 Jun 1999 09:27:30 -0500


Once upon a time, Harald Koenig <koenig@tat.physik.uni-tuebingen.de> said:
> two times I got strange ext2fs errors -- there was a duplicate block
> both times. for the 2nd error I kept the fsck log:
>
> Duplicate blocks found... invoking duplicate block passes.
> Pass 1B: Rescan for duplicate/bad blocks
> Duplicate/bad block(s) in inode 196693: 787271
> Duplicate/bad block(s) in inode 196854: 787271
> Duplicate/bad block(s) in inode 418198: 787271
> Pass 1C: Scan directories for inodes with dup blocks.
> Pass 1D: Reconciling duplicate blocks
> (There are 3 inodes containing duplicate/bad blocks.)

Yep, that is the kind of errors e2fsck gave me, except it went on
forever. It looked like it was finding Duplicate/bad block(s) in
virtually every inode. I eventually gave up on e2fsck and did a "mke2fs
-S /dev/sda9; e2fsck -f -y /dev/sda9" and was able to get a good bit of
stuff back.

> and now recently I got crashes in two programs at runtime (1st mutt, later emacs).
> in both cases `rpm -V package' showed `..5.....' and trashing the buffer cache
> `fixed' the problem.
> for emacs, I kept a copy of the bad image/buffer before `fixing' and here I get
>
> # cmp -l /usr/bin/emacs.good /usr/bin/emacs.bad
> 300137 355 255

Yeah, the first time this happened (June 1), I had this happen too. I
was doing a recursive grep through the linux source (like "find
/usr/local/src/linux -type f -name '*.[ch]' -print | xargs grep -l
something"), when I started getting the "attempt to access beyond end of
device" errors. Right after that I get the "ext2_readdir: bad entry in
directory #208897: rec_len % 4 != 0" errors. When I abort the find and
try to "ls /usr/local/src/linux", the directory is empty. I immediately
quit X and start trying to see what is wrong, and things are
disappearing all over the place. I then couldn't even run "ls" - I just
got "Segementation fault". I booted up single user, saved what I could,
and re-installed.

The second time this happened (June 15), I wasn't even doing anything
filesystem-intensive that I recall. I just noticed that when I did "ls"
in my home directory, weird things were showing up (like what once was a
directory was now a device node). I couldn't even save anything this
time (although luckily I was able to dig through the raw partition and
find the most important .tar.gz file and extract it before
re-installing.

The first time I was running 2.2.9 (it had been up for about a week) and
the second time I was running 2.2.10 (it had been up for a few days). I
have dropped back to 2.2.5, although I'll probably move up to 2.2.6
(which has been up and running great on my news server for two months).

> (so also only one bit flipped). I'm not sure if it's 2.2.9/2.2.10 or maybe
> it's a hardware problem, because it just started when I changed hardware
> (CPU, mainboard, memory) to AMD K6-2-450 with 128MB (DFI main board).
> but now reading similar reports, maybe it's not my hardware ?!
>
> did you change your hardware [settings?] recently ?

Nope. The ONLY thing I changed was the kernel. I did add another
(identical) drive after the first time everything died, because at first
I thought it might be just a drive failing. Also I wanted to try to do
a new install and recover what I could from the messed up filesystems.
However, the second time it happened on the drive I added.

-- 
Chris Adams <cadams@ro.com> - System Administrator
Renaissance Internet Services - IBS Interactive, Inc.
Home: http://ro.com/~cadams - Public key: http://ro.com/~cadams/pubkey.txt
I don't speak for anybody but myself - that's enough trouble.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/