Re: Weird file corruption (448 bytes): FS? RAM?

From: Hans Reiser (reiser@namesys.com)
Date: Tue Aug 21 2001 - 13:37:23 EST


Forgive me David, but until you put in new RAM, I am going to suspect that you
have faulty RAM. Faulty hardware sometimes varies in its failures, sometimes
decaying erratically over time.

Hans

David Madore wrote:
>
> Hi.
>
> This is to report an almost paranormal phenomenon :-)
>
> A few days ago I posted to this list because I had noticed a one-bit
> corruption of one of my files. This corruption was actually due to a
> defective RAM, as memtest86 showed. But memtest86 showed that only
> *one* bit was wrong. Now I am running Linux with "mem=" parameters
> that tell it not to use the page with the defective bit.
>
> Now I have just observed another file corruption. This one is more
> severe: 448 contiguous bytes were corrupted (details follow). This
> time I'm inclined to think the RAM is not to blame (448 defective
> *bytes* do not go unnoticed), but the filesystem (ReiserFS). However,
> I would like expert opinion on this. I would be grateful for any
> ideas or suggestions.
>
> Here are the details: the corrupted file was a shared library
> (/opt/mozilla/components/libnsappshell.so). It was in use at the time
> when the corruption occurred (i.e. Mozilla was running), but
> presumably swapped out to disk (to the filesystem, that is). However,
> corruption occurred in RAM only. Specifically: Mozilla was idle and
> probably swapped out, I exposed the window, causing it to swap in, and
> it segfaulted immediately. So I made a tar of the entire /opt/mozilla
> directory, then purged my RAM's and buffers' content as well as I
> could (by loading large files from disk, and by running programs that
> need tremendous amounts of RAM). When I was sure the /opt/mozilla
> directory was no longer in disk cache, I compared the contents of the
> tarball and the original files on disk, and the comparison showed that
> the file in the tarball was corrupted.
>
> Here is the diff between the original version (on disk) and the
> corrupted version (in RAM, hence in the tarball) of the file, through
> "od -A x -t x1 -t a":
>
> --- right.dump Mon Aug 20 22:14:40 2001
> +++ wrong.dump Mon Aug 20 22:14:28 2001
> @@ -2532,62 +2532,24 @@
> | ht l ] C cr t & nul cr < ' nul nul nul nul
> 004ff0 55 89 e5 83 ec 08 89 ec 5d c3 8d b6 00 00 00 00
> U ht e etx l bs ht l ] C cr 6 nul nul nul nul
> -005000 55 89 e5 53 83 ec 04 e8 44 ff ff ff 81 c3 90 78
> - U ht e S etx l eot h D del del del soh C dle x
> -005010 01 00 8b 93 24 02 00 00 85 d2 74 19 83 ec 08 8d
> - soh nul vt dc3 $ stx nul nul enq R t em etx l bs cr
> -005020 83 28 07 00 00 50 8d 83 e4 ff ff ff 50 e8 1e fa
> - etx ( bel nul nul P cr etx d del del del P h rs z
> -005030 ff ff 83 c4 10 8b 5d fc 89 ec 5d c3 8d 74 26 00
> - del del etx D dle vt ] | ht l ] C cr t & nul
> -005040 55 89 e5 83 ec 08 89 ec 5d c3 8d b6 00 00 00 00
> - U ht e etx l bs ht l ] C cr 6 nul nul nul nul
> -005050 55 89 e5 53 e8 00 00 00 00 5b 81 c3 43 78 01 00
> - U ht e S h nul nul nul nul [ soh C C x soh nul
> -005060 8b 45 08 8b 8b 20 05 00 00 89 08 8b 93 d0 02 00
> - vt E bs vt vt sp enq nul nul ht bs vt dc3 P stx nul
> -005070 00 89 10 89 48 04 8b 93 a8 02 00 00 89 50 04 89
> - nul ht dle ht H eot vt dc3 ( stx nul nul ht P eot ht
> -005080 48 08 8b 93 b8 02 00 00 89 50 08 89 48 0c 8b 93
> - H bs vt dc3 8 stx nul nul ht P bs ht H ff vt dc3
> -005090 c0 01 00 00 89 50 0c 89 48 10 8b 93 a0 05 00 00
> - @ soh nul nul ht P ff ht H dle vt dc3 sp enq nul nul
> -0050a0 89 50 10 8b 93 34 03 00 00 89 50 10 c7 40 14 00
> - ht P dle vt dc3 4 etx nul nul ht P dle G @ dc4 nul
> -0050b0 00 00 00 8b 93 94 05 00 00 89 10 8b 93 80 01 00
> - nul nul nul vt dc3 dc4 enq nul nul ht dle vt dc3 nul soh nul
> -0050c0 00 89 50 04 8b 93 a8 05 00 00 89 50 08 8b 93 dc
> - nul ht P eot vt dc3 ( enq nul nul ht P bs vt dc3 \
> -0050d0 03 00 00 89 50 0c 8b 93 40 02 00 00 89 50 10 c7
> - etx nul nul ht P ff vt dc3 @ stx nul nul ht P dle G
> -0050e0 40 1c 00 00 00 00 c7 40 18 00 00 00 00 8b 1c 24
> - @ fs nul nul nul nul G @ can nul nul nul nul vt fs $
> -0050f0 c9 c3 89 f6 55 89 e5 53 83 ec 04 e8 00 00 00 00
> - I C ht v U ht e S etx l eot h nul nul nul nul
> -005100 5b 81 c3 9c 77 01 00 8b 4d 08 8b 83 94 05 00 00
> - [ soh C fs w soh nul vt M bs vt etx dc4 enq nul nul
> -005110 89 01 8b 83 80 01 00 00 89 41 04 8b 83 a8 05 00
> - ht soh vt etx nul soh nul nul ht A eot vt etx ( enq nul
> -005120 00 89 41 08 8b 83 dc 03 00 00 89 41 0c 8b 83 40
> - nul ht A bs vt etx \ etx nul nul ht A ff vt etx @
> -005130 02 00 00 89 41 10 8d 51 10 8b 83 34 03 00 00 89
> - stx nul nul ht A dle cr Q dle vt etx 4 etx nul nul ht
> -005140 41 10 83 7a 04 00 74 11 8b 42 04 c7 40 08 00 00
> - A dle etx z eot nul t dc1 vt B eot G @ bs nul nul
> -005150 00 00 c7 42 04 00 00 00 00 f7 45 0c 01 00 00 00
> - nul nul G B eot nul nul nul nul w E ff soh nul nul nul
> -005160 74 0c 83 ec 0c 51 e8 d5 fc ff ff 83 c4 10 8b 5d
> - t ff etx l ff Q h U | del del etx D dle vt ]
> -005170 fc c9 c3 90 55 89 e5 8b 55 08 8b 42 18 40 89 42
> - | I C dle U ht e vt U bs vt B can @ ht B
> -005180 18 5d c3 90 55 89 e5 83 ec 08 8b 45 08 ff 48 18
> - can ] C dle U ht e etx l bs vt E bs del H can
> -005190 83 78 18 00 75 26 c7 40 18 01 00 00 00 85 c0 74
> - etx x can nul u & G @ can soh nul nul nul enq @ t
> -0051a0 12 83 ec 08 8b 50 10 6a 03 83 c0 10 50 ff 52 18
> - dc2 etx l bs vt P dle j etx etx @ dle P del R can
> -0051b0 83 c4 10 b8 00 00 00 00 eb 05 89 f6 8b 40 18 c9
> - etx D dle 8 nul nul nul nul k enq ht v vt @ can I
> +005000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> + nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
> +*
> +0050e0 00 00 00 00 00 00 00 00 00 00 00 00 ff ff ff ff
> + nul nul nul nul nul nul nul nul nul nul nul nul del del del del
> +0050f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> + nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
> +*
> +005170 60 19 81 c8 00 82 85 c3 00 00 00 00 60 90 4e d1
> + ` em soh H nul stx enq C nul nul nul nul ` dle N Q
> +005180 00 00 00 00 00 00 00 00 00 00 00 00 58 65 36 c8
> + nul nul nul nul nul nul nul nul nul nul nul nul X e 6 H
> +005190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> + nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
> +0051a0 00 00 00 00 00 00 00 00 00 00 00 00 0d 00 00 00
> + nul nul nul nul nul nul nul nul nul nul nul nul cr nul nul nul
> +0051b0 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> + si nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul
> 0051c0 c3 8d 76 00 55 89 e5 57 56 53 83 ec 0c e8 00 00
> C cr v nul U ht e W V S etx l ff h nul nul
> 0051d0 00 00 5b 81 c3 ca 76 01 00 8b 7d 0c b8 03 40 00
>
> As you can see, 448 bytes were corrupted (*only* 448, mark, not 512),
> and were replaced mostly by 0's, with a few random non-zero bytes. I
> searched for the byte string "60 19 81 c8 00 82 85 c3 00 00 00 00 60
> 90 4e d1" (which occurs in position 0x5170 in the corrupted version)
> through all of my disk (well, only one of my disks, the one which
> contained the /opt partition among other things, but not the tarball I
> made) and I did not find it. So I have no idea what these corrupted
> bytes mean or where they came from.
>
> The kernel is a 2.4.6 (with the international crypto patch 2.4.3.1,
> but I doubt that matters), compiled with egcs-1.1.2. The filesystem
> is ReiserFS. Overall distribution is RedHat-7.1 (Seawolf).
>
> Let me repeat this in case I wasn't clear the first time:
>
> * Mozilla was running fine.
>
> * I do something else (play some sound files, in particular). Mozilla
> probably gets swapped out after a while.
>
> * I expose Mozilla window. It crashes immediately.
>
> * I try to restart Mozilla. It does not work, but segfaults
> immediately on startup.
>
> * I suspect file corruption, make tarball of /opt/mozilla directory.
>
> * I do my best to purge the disk cache's content.
>
> * I compare /opt/mozilla with the tarball: the files on disk are OK,
> one file in the tarball are corrupted as shown above.
>
> If there's anything I forgot to mention, please ask.
>
> Happy hacking,
>
> --
> David A. Madore
> (david.madore@ens.fr,
> http://www.eleves.ens.fr:8080/home/madore/ )
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 23 2001 - 21:00:44 EST