Re: (cached?) files corruption.

mlord (mlord@pobox.com)
Wed, 11 Jun 1997 18:38:27 -0400


Hi.

A common hardware bug in VLB IDE interfaces
is that many of them cannot tolerate simultaneous accesses
to the primary and secondary interface ports.
Random data corruption may result.

Try booting Linux with "ide1=serialize" as a kernel parameter,
(stick it into your /etc/lilo.conf) and see if that helps.

-ml

QingLong wrote:
>
> Hi, All!
>
> This is bug report and a call for help.
>
> I've faced very-very strange (really weird) behaviour of the system.
> The problem is so strange that I even don't know what developers group
> to contact, send bug report, ask for help. Let me describe it.
>
> First problems of this type have appeared about 6 months ago,
> when I had plugged in new EIDE 3GB hard disk:
> `Western Digital "Caviar 33100"' 6136 cyl, 16 hd, 63 sec
>
> in addition to EIDE 1.3GB hard disk:
> `Seagate "Medalist"' 2477 cyl, 16 hd, 63 sec.
>
> Both HD's installed on primary IDE interface, bus is VLB,
> I/O card is "EVLSIO-V2" (PDC20230 (main chip on the card)) (made in Taiwan).
> There is standard IDE (ATAPI) CDROM drive on the secondary interface.
>
> Approximately at the same time I've installed the latest 2.1.* kernel
> (I doubt if this can be the main matter of the problems, as switching to
> 2.0.27 and 2.0.30 kernels haven't solved the problem).
>
> Never before I had had such problems (described below). Both CD drive
> and old (1.3GB) HD worked well. The new configuration have worked well for
> a while (the new HD had been plugged in November 1996 and the first
> problems arised in February 1997 (2.1.27 kernel was running at the time)).
>
> Now the problem itself.
> Frequently accessed large (more than 1MB) files suffer corruption.
> The most frequent victim of such corruption is GCC `cc1' binary
> (which is located at /usr/lib/gcc-lib/i486-linux/2.7.2.1/cc1 and
> is launched by gcc to make the main compilation).
>
> Comparing (`cmp -l') corrupted binary against fresh copy gives:
> 528122 373 377 (The difference is in one bit!)
> 528046 373 377
> 949014 373 377
> 950542 373 377
> These set of differences is stable: every time I face it,
> the corruption is restricted to these set, i.e. it is some arbitrary
> subset of this set.
>
> I've already tried (just after refreshing from GCC distribution .tar.gz):
> to move `cc1' to another place on the same partition,
> to move `cc1' to another partition on the same HD,
> `chattr +i' it and the appropriate directory
> Without any success... The aforementioned set remains constant,
> i.e. both offsets, and byte differences remain the same!
> This fact makes me guess that the problem actually is in Linux cache
> rather than HD.
>
> Moreover, often putting fresh copy of cc1 to it's standard place
> (`cp cc1 /usr/lib/gcc-lib/i486-linux/2.7.2.1/cc1' or `cat cc1 > /usr/...')
> doesn't repair the binary, but (!!!) corrupts _both_ the binary _and_
> _fresh_ _copy_ which is located on the other HD partition!
> And corruptions to a fresh copy are also restricted to the mentioned set.
> I.e. corruption takes place when a file is just accessed for reading
> often enough (probably because it goes to io cache?).
>
> Sometimes (after waiting for awhile) corrupted files repair automagically
> without any manual intervention...
>
> This weird problems aren't restricted to `cc1', I've also faced them with
> e.g. tar.gz archives.
>
> The frequency is very unstable. Sometimes it goes smooth for a week or
> even a fortnight. Sometimes such corruptions begin to happen every minute,
> and I can do (compile) nothing for a long time. :(
>
> Please, help me to solve this problem.
> I'll be glad to answer your questions and give you any additional info.
>
> Thank You very much!
>
> QingLong.
>
> PS. Please `Cc' your answers to me.
> (I am not on the list itself, I am suscribed to the mailing list digest.)