Re: PROBLEM: Filesystem corruption under 2.2.12

Florian Lohoff (flo@rfc822.org)
Wed, 1 Sep 1999 11:33:59 +0200


On Wed, Sep 01, 1999 at 02:01:37AM +0300, Touko Korpela wrote:
> [2.] Full description of the problem/report:
>
> Under Linux 2.2.12 (and 2.2.11) one directory gets
> mysteriously corrupted always when some cron-job
> scans it. This didn't happen under 2.0.x. I first
> suspected HD's DMA-mode but disabling it didn't help.
> The message is:
> EXT2-fs error (device ide0(3,5)): ext2_readdir: bad
> entry in directory #18682: rec_len is smaller than
> minimal - offset=0, inode=393222, rec_len=7, name_len=6
>
> [3.] Keywords (i.e., modules, networking, kernel):
>
> filesystem, ext2, corruption

I have seen

Aug 31 17:15:30 noose kernel: EXT2-fs error (device md(9,0)): ext2_add_entry:
bad entry in directory #2875396: inode out of bounds - offset=51856
, inode=69987932, rec_len=16, name_len=5

Aug 31 17:20:20 noose kernel: EXT2-fs error (device md(9,0)): ext2_add_entry:
bad entry in directory #2875396: inode out of bounds - offset=51856
, inode=69987932, rec_len=16, name_len=5

This was on a stripe of 3 UW SCSI Drives.

I also have seen this (before the ext2 errors) ...

Aug 30 20:36:00 noose kernel: free_one_pmd: bad directory entry 04000000

Multiple times ...

Aug 22 19:57:08 noose kernel: md driver 0.36.6 MAX_MD_DEV=4, MAX_REAL=8
Aug 22 19:57:08 noose kernel: (scsi0) <Adaptec AHA-294X Ultra SCSI host adapter> found at PCI 10/0
Aug 22 19:57:08 noose kernel: (scsi0) Wide Channel, SCSI ID=7, 16/255 SCBs
Aug 22 19:57:08 noose kernel: (scsi0) Downloading sequencer code... 413 instructions downloaded
Aug 22 19:57:08 noose kernel: scsi0 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.19/3.2.4
Aug 22 19:57:08 noose kernel: <Adaptec AHA-294X Ultra SCSI host adapter>
Aug 22 19:57:08 noose kernel: scsi : 1 host.
Aug 22 19:57:08 noose kernel: (scsi0:0:0:0) Synchronous at 5.0 Mbyte/sec, offset 15.
Aug 22 19:57:08 noose kernel: Vendor: IBM Model: DCAS-34330 Rev: S65A
Aug 22 19:57:08 noose kernel: Type: Direct-Access ANSI SCSI revision: 02
Aug 22 19:57:08 noose kernel: Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
Aug 22 19:57:08 noose kernel: (scsi0:0:1:0) Data overrun detected in Data-In phase, tag 1;
Aug 22 19:57:08 noose kernel: Have seen Data Phase. Length=255, NumSGs=1.
Aug 22 19:57:08 noose kernel: sg[0] - Addr 0x7fed380 : Length 255
Aug 22 19:57:08 noose kernel: Vendor: COMPAQ Model: ST15150W Rev: 6215
Aug 22 19:57:08 noose kernel: Type: Direct-Access ANSI SCSI revision: 02
Aug 22 19:57:08 noose kernel: Detected scsi disk sdb at scsi0, channel 0, id 1, lun 0
Aug 22 19:57:08 noose kernel: (scsi0:0:1:0) Synchronous at 20.0 Mbyte/sec, offset 8.

And 2 More disks like the last one ...

The first problem of ext2 errors i had after 2 days of Uptime
and ~1GB additional News traffic. So the corruptions seems
to be quite rare.

This is an AMD K6 300Mhz with a single 128MB PC-100 Dimm ..

No other known problems on that machine ... Running inn, postfix, listar etc.

Flo

-- 
Florian Lohoff		flo@rfc822.org		      	+49-5241-470566
  ...  The failure can be random; however, when it does occur, it is
  catastrophic and is repeatable  ...             Cisco Field Notice

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/