Re: EXT2 and BadBlock updating.....

From: Andre Hedrick (andre@linux-ide.org)
Date: Wed Apr 12 2000 - 00:20:43 EST


On Tue, 11 Apr 2000, Stephen C. Tweedie wrote:

> Hi,
>
> On Tue, Apr 11, 2000 at 11:58:55AM -0700, Andre Hedrick wrote:
> >
> > You mentioned that there is no way to actively update the badblocks list
> > in EXT2.
>
> No, I said that there was no way to get the kernel to add to the
> badblocks list automatically. You have to do it from user space:
> e2fsck has options to add specific blocks to the bad block list, or
> to do a surface scan and add any bad blocks it finds. It can even
> relocate critical fs structures like bitmaps around any bad blocks
> it finds.

Okay,

Someone explain this........

On Thu, 6 Apr 2000, Alan Cox wrote:

> > > Multiwrite IDE breaks on a disk error
> >
> > Explain.........Please........
>
> If you have one bad sector you should write the other 7..

Now if the ata/ide driver does not address this recovery then I see big
problems. Alan's case (my reading) states that regardless if we blow the
write to a sector (based on 8 multi-write command) we should write all
that we can........

My idea was that if we are going to try and salvage a hardware screw-up
(because of a stray neutrino doinks a sector that was good) on a file
write to disk, we need the following:

One, reset the request cue and recover data that we just tried to write to
a new transformed badblock. That is in an eight sector write:

0|1|2|3|4|5|6|7 This is what I think Alan is suggesting to do.
w|w|w|w|F|w|w|w

Theodore -> T|h|e|o|F|o|r|e -> T|h|e|o|?|o|r|e -> "Theo ore"

I read this as FS corruption because a hardware failures.

Two, at the point of error and resetting or adjusting the cue. We do
something like this......

0|1|2|3|4|5|6|7
w|w|w|w|F -> fork recovery because 0,1,2,3 were succesful

adjust request and no update sector write position and begin a SEEK for
the first good sector to write the next part of the file to that location.
Now why the "fork recovery"? We need to finish/complete the write
request, but because we discovered a NEW BAD BLOCK/SECTOR we should walk
the rest of the write because there may be a section if the disk/track
that is failing. Also this fork would provide the means to log the
location of the newly failed sector and go back and MARK BAD and issue a
request to the FS to update the BADBLOCKS table. Thus we get:

            0|1|2|3| 4 |5|6|7|8 0|1|2|3| 4 |5|6|7|8
Theodore -> w|w|w|w|FSR|w|w|w|w -> T|h|e|o|FSR|d|o|r|e -> Theodore

FSR == FaultSeekRecover THREAD............

NOW the IO price is high but FS corruption is the highest price.
I am guessing that it could require up to a 10 Second recovery penality.
This would include reducing the DISK IO RATE to the slowest and recover
the WRITE request (with the stepwise seek write verify fork), do a "N"
badblock MARK to the sector, query the FS to do perform an update and
verification of the BADBLOCK table, and finally revert the drive and host
settings to the transfer rate that was in place before the error happened.

If this is not clear, tell me and I will try another explaination.
Also tell me if I am out of my mind for even suggesting this method.
Regardless if you think I am NUTTY, if SCSI also suffers from this
problem, how do we every make Enterprize OS Class, until we fix/enable
this recovery.

Now I very rarely get to poke my nose into other parts of the kernel.
If the FS portion of the VFS has a means to handle this already, I need to
know how to access this and make use of it.

Sorry Eric, I dragged you into this in case you can shed some light my
way if SCSI has a recovery method already, how I can adapt/modify/borrow
what you have working.

Cheers,

Andre Hedrick
The Linux ATA/IDE guy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Apr 15 2000 - 21:00:17 EST