Re: raid 1 - automatic 'repair' possible?

From: Ric Wheeler
Date: Fri Jan 28 2005 - 11:41:57 EST

Next message: Dmitry Torokhov: "Re: atkbd_init lockup with 2.6.11-rc1"
Previous message: Olaf Hering: "[PATCH] typo in pci_scan_bus_parented"
Next in thread: Alan Cox: "Re: raid 1 - automatic 'repair' possible?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Having looked at a lot of disks, I think that it is definitely worth forcing a write to try and invoke the remap. With large drives, you usually several bad sectors in the normal case (drive vendors allocate up to a couple thousand spare sectors just for remapping).

Depending on the type of drive error, the act of writing is likely to clean the questionable sector and leave you with a perfectly fine disk.

Ric

Jakob Oestergaard wrote:

On Wed, Jan 19, 2005 at 11:48:52AM +0100, Kiniger wrote:
...

some random thoughts:

nowadays hardware sector sizes are much bigger than 512 bytes

No :)

and
the read error may affect some sectors +- the sector which actually
returned the error.

That's right

to keep the handling in userspace as much as possible:

the real problem is the long resync time. therefore it would
be sufficient to have a concept of "defective areas" per partition
and drive (a few of them, perhaps four or so , would be enough) which will be excluded from reads/writes and some means to
re-synchronize these "defective areas" from the good counterparts
of the other disk. This would avoid having the whole partition being
marked as defective.

I wonder if it's really worth it.

The original idea has some merit I think - but what you're suggesting
here is almost "bad block remapping" with transparent recovery and user
space policy agents etc. etc.

If a drive has problems reading the platter, it can usually be corrected
by overwriting the given sector (either the drive can actually overwrite
the sector in place, or it will re-allocate it with severe read
performance penalties following). But there's a reason why that sector
went bad, and you realy want to get the disk replaced.

I think the current policy of marking the disk as failed when it has
failed is sensible.

Just my 0.02 Euro

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Dmitry Torokhov: "Re: atkbd_init lockup with 2.6.11-rc1"
Previous message: Olaf Hering: "[PATCH] typo in pci_scan_bus_parented"
Next in thread: Alan Cox: "Re: raid 1 - automatic 'repair' possible?"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]