Re: MD/RAID time out writing superblock

From: Mark Lord
Date: Tue Sep 01 2009 - 09:15:50 EST


Andrei Tanas wrote:
The drive might take a longer time like this when doing error handling
(sector remapping, etc), but then I would expect to see your remapped
sector count grow.
Yes, this is a possibility and according to the spec, libata EH should
be retrying flushes a few times before giving up but I'm not sure
whether keeping retrying for several minutes is a good idea either.
Is it?
..

Libata will retry only when the FLUSH returns an error,
and the next FLUSH will continue after the point where
the first attempt failed.

But if the drive can still auto-relocate sectors, then the
first FLUSH won't actually fail.. it will simply take longer
than normal.

A couple of those, and we're into the tens of seconds range
for time.

Still, it would be good to actually produce an error like that
to examine under controlled circumstances.

Hmm.. I had a drive here that gave symptoms like that.
Eventually, I discovered that drive had run out of relocatable
sectors, too. Mmm.. I'll see if I can get it back (loaned it out)
and perhaps we can recreate this specific scenario on it..
..

I checked today, and that drive is no longer available.

Mine errored out again with exactly the same symptoms, this time after only
few days and with the "tunable" set to 2 sec. I got a warranty replacement
but haven't shipped this one yet. Let me know if you want it.
..

Not me. But perhaps Tejun ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/