Re: 2.4.20-pre4-ac1 trashed my system

From: Mike Isely (isely@pobox.com)
Date: Fri Aug 30 2002 - 02:07:36 EST


OK, I have some good news and some bad news.

The bad news is that I replicated the corruption.

The good news is that I replicated the corruption. Oh, and I can
cause it on demand, and not lose my system in the process. I can
provide LOTS and LOTS of details now. What do you want to know?

Some additional background: The 160GB Maxtor has a number of file
systems on it. Here's the fdisk -l output:

Disk /dev/hde: 255 heads, 63 sectors, 19929 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot Start End Blocks Id System
/dev/hde1 * 1 912 7325608+ c Win95 FAT32 (LBA)
/dev/hde2 913 19929 152754052+ 5 Extended
/dev/hde5 913 936 192748+ 83 Linux
/dev/hde6 937 985 393561 83 Linux
/dev/hde7 986 1058 586341 82 Linux swap
/dev/hde8 1059 1423 2931831 83 Linux
/dev/hde9 1424 19929 148649413+ 83 Linux

The file system that started all the fireworks was the big one at the
end, hde9. The rescue partition that booted up corrupted
afterwards was hde6. The toasted root partition was hde8.

Here's what I did:

1. I pulled a spare hard drive (80GB Maxtor) and installed it in the
   system as hda (primary controller, primary channel, master).

2. I put a Debian installation there. Updated the kernel to
   2.4.19-ac4.

3. With a stable system on the spare drive, I moved the 160GB Maxtor
   to be hdc (primary controller, secondary channel, master).

4. Using an alternate superblock I managed to fsck the fsck'ed up file
   systems on the 160GB drive while running as hdc, while booted under
   2.4.19-ac4.

5. I then ran additional fsck passes on the 160GB drive, checking all
   partitions. Just for paranoia's sake. All now passed clean.

6. I shut down the system, moved the 160GB drive to be hde (Promise
   controller, primary channel, master), and rebooted.

7. I ran the fsck passes again on the drive. Note: This is still
   under 2.4.19-ac4, but using the Promise controller. All passed,
   squeaky clean. So under 2.4.19-ac4 there's no problem.

8. I rebooted the system to 2.4.20-pre4-ac1 and fsck'ed the big
   partition again. Splat. Some time after 50% done it reported an
   error.

Unlike the initial carnage, I wasn't an idiot and didn't use the -y
fsck option this time, so it stopped after the first error and since
I'm not writing to the drive, the contents hopefully should still be
OK. I've already rebooted again and repeated the last step. I should
be able to repeat this experiment as often as needed.

Clearly there's something wrong in 2.4.20-pre4-ac1 that wasn't wrong
in 2.4.19-ac4 that is impacting my setup.

Some additional datapoints:

  1. During bootup of 2.4.20-pre4-ac1, I found the following message
     in the kernel log, not previously seen:

> hde: Maxtor 4G160J8, ATA DISK drive
> ULTRA 66/100/133: Primary channel of Ultra 66/100/133 requires an 80-pin cable for Ultra66 operation.
> Switching to Ultra33 mode.
> Warning: Primary channel requires an 80-pin cable for operation.
> hde reduced to Ultra33 mode.

     What makes this notable is that there is indeed an 80 pin cable
     connecting the 160GB drive to that controller. I hadn't noticed
     this message in 2.4.19-ac4, but honestly I didn't directly look
     for it yet. I'll check that.

  2. I did something else that night that may have been less than
     smart. I remembered it tonight and repeated the experiment. I
     tried to read-only mount hde9 while the fsck was running. When
     this happens, the fsck process gets a short read and complains.
     Obviously that's going to mess up fsck. However that little
     shenanigan is not needed to screw things up. Tonight I ran step
     8 (above) twice. The first time was after restarting fsck, after
     fsck had failed on account of my trying to ro-mount the file
     system. The second time - after rebooting - I still got the fsck
     failure some time after 50% completion, without having to try to
     mount anything.

I've got a system here that I can foul-up on demand now. What would
you like me to do?

  -Mike

                        | Mike Isely | PGP fingerprint
    POSITIVELY NO | | 03 54 43 4D 75 E5 CC 92
 UNSOLICITED JUNK MAIL! | isely @ pobox (dot) com | 71 16 01 E2 B5 F5 C1 E8
                        | (spam-foiling address) |

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Aug 31 2002 - 22:00:30 EST