Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR

From: Matt Mackall
Date: Fri Feb 02 2007 - 15:03:06 EST

Next message: Linus Torvalds: "Re: [PATCH 2 of 4] Introduce i386 fibril scheduling"
Previous message: Michael K. Edwards: "Re: Should io(read|write)(8|16|32)_rep take (const|) volatile u(8|16|32) __iomem *addr?"
In reply to: Mark Lord: "Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR"
Next in thread: Mark Lord: "Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Feb 02, 2007 at 11:06:19AM -0500, Mark Lord wrote:
> Alan wrote:
> >
> >If this is the right strategy for disk recovery for a given type of
> >device then this ought to be an automatic strategy. Most end users will
> >not have the knowledge to frob about in sysfs, and if the bad sector hits
> >at the wrong moment a sensible automatic recovery strategy is going to do
> >the right thing faster than human intervention, which may be some hours
> >away.
>
> I think something we seem to be discussing here are the opposite aims
> of enterprise RAID (traditional SCSI market) versus desktop filesystems
> (now the number one user of Linux SCSI, courtesy of libata).
>
> With RAID, it's safe to suggest that a very fast, bounded exit from EH
> is almost always what the customer / end-user wants, because the higher
> level RAID management can then deal with the failed drive offline.
>
> But for a desktop filesystem, failing out quickly and bulk-failing megabytes
> around a couple of bad sectors is not good -- no redundancy, and this will
> lead to unneeded data loss.
>
> It's beginning to look like this needs to be run-time tuneable,
> per block minor device (per-partition), through sysfs at a minimum.
> The RAID tools could then automatically choose settings to bias more
> towards an "instant exit" when errors are found, whereas a non-RAID
> desktop could select a more reliable recovery strategy.
>
> Right now, with Jame's patch (earlier in this thread), the whole scheme
> is heavily weighted towards the RAID "instant exit" strategy, which is
> making me quite nervous about the data on my laptop.

Also worth considering is that spending minutes trying to reread
damaged sectors is likely to accelerate your death spiral. More data
may be recoverable if you give up quickly in a first pass, then go
back and manually retry damaged bits with smaller I/Os.

Another approach is to have separate retries per hard sector and
hard errors per MB at the block device level. We don't want to have
the same number of retries for a 64KB block as a 1MB block and we
certainly don't want to do 2K retries in a row.

So for instance, I could have the first set to 1 and the second set to
16. For a 256KB read, this gets scaled down to 4, which means we'll
retry each sector once and fail the whole I/O when we hit the 5th
sector error.

More reasonable number might be 0 and 1, meaning "don't do OS level
retries on sectors and fail the whole I/O on the second sector error
(or immediately for smaller reads)".

It might also be informative to add a kernel message reporting if a retry
(in the non-transient cases) ever actually succeeds.

--
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Linus Torvalds: "Re: [PATCH 2 of 4] Introduce i386 fibril scheduling"
Previous message: Michael K. Edwards: "Re: Should io(read|write)(8|16|32)_rep take (const|) volatile u(8|16|32) __iomem *addr?"
In reply to: Mark Lord: "Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR"
Next in thread: Mark Lord: "Re: [PATCH] scsi_lib.c: continue after MEDIUM_ERROR"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]