Re: Corrupt file-system(s) leads to crash

Albert D. Cahalan (acahalan@cs.uml.edu)
Mon, 6 Apr 1998 21:34:35 -0400 (EDT)


> While trying to get the screen card running under X, after spending
> several days copying zillions of files from the server it will replace,
> the usual happened. The machine hung. I had to hit the reset switch.

That looks like another linux-kernel thread...

> damaged. This caused seeks beyond the end of the media, lots of
> SCSI bus resets, then eventually a crash which left all disk drives
> destroyed. Yes, the READ CAPACITY command wouldn't even work from
> the Adaptec BIOS menu. I had to low-level format all three drives to
> get them back on-line.
>
> Now... Please put a check within scsi.c for any attempt to access
> beyond the capacity of the media and return an error without actually
> trying to see if you can get away with it!. The read-capacity command
> occurs early in the startup. These values should be saved and used
> during SCSI read/write access.
...
> The problem with attempting to access beyond the media is that the
> Disk drives are dumb and will try to do just that. Something, somewhere,
> must be smart enough to prevent this from happening. Presently, if
> a file-system is slightly corrupt, one bad offset, you can destroy
> every file-system on the controller. Especially if it's an Adaptec
> running the aic7xxx driver.

Some people are reporting related troubles with corrupted partition
tables getting even more corrupted:

1. fdisk puts extended partition info _in_ the previous partition
2. mke2fs does not sanity check
3. the kernel partition code does not sanity check
4. somebody fills up the filesystem
5. goodbye partitions!

It's not as serious as your disk destruction, but the solution is
the same. Partition size and location must be sanity checked when
the partition table is read. Partition access must be bounds checked.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu