Re: Serious ext2fs problem

Gerard Roudier (groudier@club-internet.fr)
Wed, 1 Jan 1997 10:34:36 +0000 (GMT)


On Wed, 1 Jan 1997, Theodore Y. Ts'o wrote:

> From: Chris Adams <cadams@ro.com>
> Date: Tue, 31 Dec 1996 13:21:15 -0600 (CST)
>
> Once the errors start happening, they keep happening at the same place
> over and over again. However, if I stop INN, umount the drive, mount
> the drive, and start INN, everything is happy (I had it giving me the
> above error twice everytime I tried to tell INN to "go"). Since I
> umounted the drive and remounted it, it has been running just fine for
> ~24 hours. That doesn't seem like hardware to me.
>
> On the contrary, this sounds very much like a hardware error, where the
> SCSI bus, the SCSI controller, the DMA transfer, or your PCI/ISA bus
> somehow flipped a bit on read, and the corrupted disk transfer got stuck
> in your buffer cache.
>
> Since unmounting and remounting the filesystem fixed the problem, this
> most certainly exonerates e2fsck and the ext2 filesystem code, at least
> as far as the theory that the ext2 kernel code is detecting some wierd
> error condition which e2fsck isn't fixing.

Ted,

This "remake" is very interesting.
However, in my humble opinion, we should switch from
CrystalBallDebuggingStrategy (TM) to something that will help to find
real causes of ext2 dir blocks problem.

Below is a mail I sent one month ago. I received no reply.
There are probably lots of English mistakes in the text, but the patch
should work.
Mail if you think it is stupid or wrong.

Happy New Year!

Gerard.

// Date: Tue, 26 Nov 1996 00:41:19 +0000 (GMT)
// From: Gerard Roudier <groudier@club-internet.fr>
// To: Jon Lewis <jlewis@inorganic5.fdt.net>
// Cc: "Theodore Y. Ts'o" <tytso@MIT.EDU>, ncr53c810@colorado.edu,
// Linux SCSI Mailing List <linux-scsi@vger.rutgers.edu>
// Subject: Re: more scsi errors :(

Jon,

The following patch should display in the log bad directory blocks at
most every 30 seconds (avoid to stuff the log too much).
If it is possible for you to give a try, we probably could have some
information about the possible cause of ext2 dir blocks corruption.
It is a hack. Check the source before try.
It worked for me.

(against 2.0.26)

--- linux/fs/ext2/dir.c.00 Mon Nov 25 23:29:34 1996
+++ linux/fs/ext2/dir.c Tue Nov 26 00:18:38 1996
@@ -71,6 +71,26 @@
NULL /* smap */
};

+
+#define BPL 16
+static void dump_buffer(char *msg, unsigned char *p, int n)
+{
+ char buf[BPL*3 + 1];
+ int i = 0;
+ int j = 0;
+
+ while (n > 0) {
+ i += sprintf(buf+i, "%02x ", *p);
+ ++p; --n;
+
+ if (i && (i > sizeof(buf) - 3 || !n)) {
+ printk("%s %04d: %s\n", msg, j, buf);
+ i = 0;
+ j += BPL;
+ }
+ }
+}
+
int ext2_check_dir_entry (const char * function, struct inode * dir,
struct ext2_dir_entry * de, struct buffer_head * bh,
unsigned long offset)
@@ -94,6 +114,19 @@
"offset=%lu, inode=%lu, rec_len=%d, name_len=%d",
dir->i_ino, error_msg, offset, (unsigned long) de->inode,
de->rec_len, de->name_len);
+/* #define DEBUG_DUMP */
+#ifdef DEBUG_DUMP
+ if (1) {
+#else
+ if (error_msg != NULL) {
+#endif
+ static unsigned long last_dump_jiffies = 0;
+ if (last_dump_jiffies + 30*HZ <= jiffies) {
+ dump_buffer("EXT2-bad dir block dump", bh->b_data, dir->i_sb->s_blocksize);
+ last_dump_jiffies = jiffies;
+ }
+ }
+
return error_msg == NULL ? 1 : 0;
}

------------------------------------------------------------------------

I have tried it (DEBUG_DUMP defined). Here is an example of results I got:
(cut -b 30-)

EXT2-bad dir block dump 0000: c1 3f 00 00 0c 00 01 00 2e 00 00 00 e9 17 00 00
EXT2-bad dir block dump 0016: 0c 00 02 00 2e 2e 00 00 c2 3f 00 00 14 00 0c 00
EXT2-bad dir block dump 0032: 6c 69 62 63 6f 6d 5f 65 72 72 2e 61 c3 3f 00 00
EXT2-bad dir block dump 0048: 10 00 07 00 6c 69 62 73 73 2e 61 00 c4 3f 00 00
EXT2-bad dir block dump 0064: 14 00 0b 00 6c 69 62 65 78 74 32 66 73 2e 61 00
[ etc..., until... ]
EXT2-bad dir block dump 0944: 78 61 6d 70 6c 65 73 00 f2 7e 01 00 10 00 07 00
EXT2-bad dir block dump 0960: 67 63 6c 2d 32 2e 31 00 e4 3f 00 00 10 00 06 00
EXT2-bad dir block dump 0976: 63 72 74 31 2e 6f 00 00 e5 3f 00 00 14 00 0a 00
EXT2-bad dir block dump 0992: 63 72 74 62 65 67 69 6e 2e 6f 00 00 e6 3f 00 00
EXT2-bad dir block dump 1008: 14 00 0b 00 63 72 74 62 65 67 69 6e 53 2e 6f 00

Obviously, this block is not bad.

Gerard.