ext2-fs errors/corruption (more data)

Todd J Derr (infidel+@pitt.edu)
Sun, 7 Jul 1996 18:17:22 -0400 (EDT)


[more data about the e2fs errors I posted about an hour or 2 ago]

I was seeing some scary things on my machine, so I managed to
umount the corrupted fs and fsck it. At this point, I'm not sure if
the evidence points to a bug in e2fs, though there is definately a bug
in e2fsck.

When I ran e2fsck, I realized that we had had a little mishap
with the machine ~2 months ago. I rebooted the machine with 'reboot'
instead of 'shutdown' by mistake, and when it came back up, it had to
be fsck'ed manually. The machine is at a remote location, so I got
someone else to do the fsck; he said it was complaining about
'duplicate blocks' and that he told fsck to fix it and all seemed
well. However, now it appears as if e2fsck was not able to really fix
the problem.

Some time later, we noticed that metamail was dumping core; I
didn't really look into why this was happening at the time.

I have all of the output from e2fsck saved, I can send it on
request, I'll summarize it somewhat, my comments in [].

#e2fsck -v /dev/sda3
e2fsck 1.02, 16-Jan-96 for EXT2 FS 0.5b, 95/08/09
/dev/sda3 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes

Deleted inode detected with non-zero link count.
This is probably due to old ext2fs kernel code.
Fix inode(s)<y>? yes
[this machine was initially brought up and the fs created with 1.2.13,
but I upgraded immediately to 1.3.80 or so because I needed the 3c590
driver]. I also upgraded libc and rebuilt some things (e2fsprogs
included) before it got any real use]

Inode 54249 is deleted w/ non-zero link_count. CLEARED
Inode 54253 is deleted w/ non-zero link_count. CLEARED
Inode 54256 is deleted w/ non-zero link_count. CLEARED
[the first 2 are the files I was having problems with, where the inode
appeared to contain all '1' bits. No idea what the third file is]

Remove illegal block(s) in inode 102636<y>? yes

Block #140 (1684480044) > BLOCKS (873642). CLEARED
[blocks 140-158 all have a different bogus block number]
Too many illegal blocks in inode 102636.
Clear inode<y>? yes
[this was a legitimate file under my home directory, its entry gets
deleted below]

Restarting e2fsck from the beginning...
/dev/sda3 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Duplicate blocks found... invoking duplicate block passes.
Pass 1B: Rescan for duplicate/bad blocks
Duplicate/bad block(s) in inode 10701: 110761 110762 110763 110764 110765
110766 110767 110768 110769 110770 110771 110772 110773 110774 110775 110776
110777 110778 110779 110780 110781 110782 110783 110784 110785 110786 110787
110788 110789 110790 110791 110792 110793 110794
Duplicate/bad block(s) in inode 10710: 110761 110762 110763 110764 110765
110766 110767 110768 110769 110770 110771 110772 110773 110774 110775 110776
110777 110778 110779 110780 110781 110782 110783 110784 110785 110786 110787
110788 110789 110790 110791 110792 110793 110794
[these 2 files (see below) are 'mc' and 'metamail']

Illegal block number passed to ext2fs_test_block_bitmap #3504073351
for multiply claimed block map
[I get 255 of these messages for different block numbers]

Pass 1C: Scan directories for inodes with dup blocks.
Pass 1D: Reconciling duplicate blocks
(There are 2 inodes containing duplicate/bad blocks.)

File /bin/metamail (inode #10710, mod time Tue Oct 31 01:55:39 1995)
has 34 duplicate blocks, shared with 1 file:
/bin/mc (inode #10701, mod time Tue Oct 31 01:54:31 1995)
Clone duplicate/bad blocks<y>? yes

File /bin/mc (inode #10701, mod time Tue Oct 31 01:54:31 1995)
has 34 duplicate blocks, shared with 1 file:
/bin/metamail (inode #10710, mod time Tue Oct 31 01:55:39 1995)
Clone duplicate/bad blocks<y>? yes

[e2fsck was unable to really fix this problem - see below]

Entry 'Bike-Small_Open.jpg' in /home/tjd/misc/bmwcca (102614) has
deleted/unused inode 102636.
Clear<y>? yes
[This was the file that had its inode cleared above]

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Fix summary information<y>? yes

[then it fixes some superblock stuff and dumps the fs info and exits.]

after this was done, I ran e2fsck -f -v /dev/sda3 to make sure all was
well. However, I still got all the messages about duplicate blocks,
and all the 'Illegal block' messages, and it prompted me again to fix
the two files. I tried a few more times with the same results,
answering 'yes/yes', 'yes/no', 'no/yes', 'no/no'. Finally, I
remounted the fs and deleted 'metamail', and re-ran fsck:

e2fsck 1.02, 16-Jan-96 for EXT2 FS 0.5b, 95/08/09
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Fix summary information<y>? yes

Block bitmap differences: -308 -321 -322 -323 -434 -435 -436 -437 -438
-439 -442
-443 -444 -461 -522 -523 -524 -525 -526 -527 -528 -529 -530 -531 -532
-533 -655
-738 -739 -740 -741 -742 -743 -744. FIXED
Free blocks count wrong for group 0 (2388, counted=2422). FIXED
Free blocks count wrong (213818, counted=213852). FIXED

/dev/sda3: ***** FILE SYSTEM WAS MODIFIED *****

after this, everything was happy. I fsck'ed again, remounted the fs,
deleted 'mc', re-fscked, and it now appears that all is well.

At any rate, e2fsck was not able to deal with the duplicate blocks in
the two files, which points to a bug in e2fsck (couldn't really fix
the problem though it said it did).

Now, I'm not sure if there's evidence for a bug in e2fs or not.
Either:

- the metamail/mc problem (definately caused by user error) caused
some inconsistent state to persist in the fs, which eventually led to
the other errors we were seeing.

- the two problems are independent of each other, with the second
being a legitimate bug.

I don't suppose we'll really be able to tell at this point in the
game, so... at any rate, fsck needs a test (and a fix) for the
duup blocks problem i saw.

todd.