Erroneous data with ext2fs

Carsten Leonhardt (leo@arioch.oche.de)
09 Mar 1997 18:50:11 +0100


Hi there!

*Abstract*:

On a block device (blocksize 1024 bytes) with ext2fs, I don't reliably
get back from a file what I first wrote in.

*Long story*:

I have a block device with 637041 blocks, 1024 bytes per block (it's a
magneto-optical drive). The following tests were made with 4096 bytes
per inode, if that makes any difference. I'm currently running
Linux 2.0.29; earlier versions show the same behaviour.

I noticed this when I first used my mo-drive with ext2. As I am
cautious, I checked all files that I put onto the new drive, and there
were checksum errors reported by gzip.
I then checked the media (the mo-disk) via badblocks with write
enabled (does 4 write/read passes with different patterns) on the
device. No errors. Then I tested xiafs: no errors (I use xiafs on that
device for some months now without any problems).

A simple way of showing the error is to "cat /dev/zero >
/mnt/mo-drive/testfile" and then do some
"cmp -l /mnt/mo-drive/testfile /dev/zero" (sequentially).

Numbers of cmp errors for 19 tries:

run-01: 767
run-02: 767
run-03: 1
run-04: 4255
run-05: 3932
run-06: 0
run-07: 0
run-08: 0
run-09: 1
run-10: 3489
run-11: 0
run-12: 0
run-13: 3487
run-14: 2031
run-15: 2721
run-16: 4
run-17: 1704
run-18: 0
run-19: 769

First few lines of output during run-04):

decimal | octal

514297317 2 0
514297318 200 0
525807617 206 0
525807618 41 0
525807619 10 0
525807621 207 0
525807622 41 0
525807623 10 0
525807625 210 0
525807626 41 0
525807627 10 0
525807629 211 0
525807630 41 0
525807631 10 0
525807633 212 0
525807634 41 0
525807635 10 0
525807637 213 0
525807638 41 0
525807639 10 0
525807641 214 0
525807642 41 0
525807643 10 0
525807645 215 0
525807646 41 0
525807647 10 0
525807649 216 0
525807650 41 0
525807651 10 0

As you may notice, from 525807618 on, it looks like a 4-byte integer in
lsb order which counts up.

Another analysis of the cmp-results shows:

> cat run-01 run-02 | awk '{print $1}' | sort -n | uniq | wc
1534 1534 15340

Although run-01 and run-02 showed the same number of errors, they
occured at different places.

Next analysis:

> cat run-* | sort -n
489283928 14 0
514297317 2 0
514297317 2 0
514297318 200 0
514297318 200 0
514749549 20 0
...
525807842 41 0
525807842 41 0
525807842 41 0
525807842 41 0
525807842 41 0
525807843 10 0
525807843 10 0
525807843 10 0
525807843 10 0
525807843 10 0
525807845 277 0
525807845 277 0
525807845 277 0
525807845 277 0
525807845 277 0
525807846 41 0
525807846 41 0
525807846 41 0
525807846 41 0
525807846 41 0
525807847 10 0
525807847 10 0
525807847 10 0
525807847 10 0
525807847 10 0
525807849 300 0
525807849 300 0
525807849 300 0
525807849 300 0
525807849 300 0
...

This seems to indicate that *if* the read data is false, the false
data is consistent. This also shows that errors do not occur on the
beginning of the filesystem.

Well, I hope somebody fixes this...

If that somebody needs additional info, please contact me.

Leo