Re: MD/RAID time out writing superblock

From: Ric Wheeler
Date: Mon Aug 31 2009 - 08:05:34 EST


On 08/31/2009 04:10 AM, Tejun Heo wrote:
Ric Wheeler wrote:
On 08/27/2009 05:22 PM, Andrei Tanas wrote:
Hello,

This is about the same problem that I wrote two days ago (md gets an
error
while writing superblock and fails a hard drive).

I've tried to figure out what's really going on, and as far as I can
tell,
the disk doesn't really fail (as confirmed by multiple tests), it
times out
trying to execute ATA_CMD_FLUSH_EXT ("at2.00 cmd ea..." in the log)
command. The reason for this I believe is that md_super_write queues the
write comand with BIO_RW_SYNCIO flag.
As I wrote before, with 32MB cache it is conceivable that it will take
the
drive longer than 30 seconds (defined by SD_TIMEOUT in scsi/sd.h) to
flush
its buffers.

Changing safe_mode_delay to more conservative 2 seconds should definitely
help, but is it really necessary to write the superblock synchronously
when
array changes status from active to active-idle?

[90307.328266] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
frozen
[90307.328275] ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
[90307.328277] res 40/00:01:01:4f:c2/00:00:00:00:00/00 Emask 0x4
(timeout)
[90307.328280] ata2.00: status: { DRDY }
[90307.328288] ata2: hard resetting link
[90313.218511] ata2: link is slow to respond, please be patient (ready=0)
[90317.377711] ata2: SRST failed (errno=-16)
[90317.377720] ata2: hard resetting link
[90318.251720] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[90318.338026] ata2.00: configured for UDMA/133
[90318.338062] ata2: EH complete
[90318.370625] end_request: I/O error, dev sdb, sector 1953519935
[90318.370632] md: super_written gets error=-5, uptodate=0


30 seconds is a very long time for a drive to respond, but I think that
your explanation fits the facts pretty well...
Even with 32MB cache, 30secs should be more than enough. It's not
like the drive is gonna do random write on those. It's likely to make
only very few number of strokes over the platter and it really
shouldn't take very long. I'm yet to see an actual case where a
properly functioning drive timed out flush because the flush itself
took long enough.


I agree - vendors put a lot of pressure on drive manufacturers to finish up (even during error recovery) in much less than 30 seconds. The push was always for something closer to 15 seconds iirc.

The drive might take a longer time like this when doing error handling
(sector remapping, etc), but then I would expect to see your remapped
sector count grow.
Yes, this is a possibility and according to the spec, libata EH should
be retrying flushes a few times before giving up but I'm not sure
whether keeping retrying for several minutes is a good idea either.
Is it?

Thanks.


I don't think that retrying for minutes is a good idea. I wonder if this could be caused by power issues or cable issues to the drive?

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/