Re: MD/RAID time out writing superblock

From: Chris Webb
Date: Wed Sep 09 2009 - 08:02:28 EST


Chris Webb <chris@xxxxxxxxxxxx> writes:

> I've also noticed that during this recovery, I'm seeing lots of timeouts but
> they don't seem to interrupt the resync:
>
> 05:47:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> 05:47:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
> 05:47:39 res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
> 05:47:39 ata5.00: status: { DRDY }
> 05:47:39 ata5: hard resetting link
> 05:47:49 ata5: softreset failed (device not ready)
> 05:47:49 ata5: hard resetting link
> 05:47:49 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> 05:47:49 ata5.00: configured for UDMA/133
> 05:47:49 ata5: EH complete
>
> 08:17:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> 08:17:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
> 08:17:39 res 40/00:00:35:83:f8/00:00:4d:00:00/40 Emask 0x4 (timeout)
> 08:17:39 ata5.00: status: { DRDY }
> 08:17:39 ata5: hard resetting link
> 08:17:49 ata5: softreset failed (device not ready)
> 08:17:49 ata5: hard resetting link
> 08:17:49 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> 08:17:49 ata5.00: configured for UDMA/133
> 08:17:49 ata5: EH complete
>
> 10:22:39 ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> 10:22:39 ata5.00: cmd ec/00:01:00:00:00/00:00:00:00:00/00 tag 0 pio 512 in
> 10:22:39 res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
> 10:22:39 ata5.00: status: { DRDY }
> 10:22:39 ata5: hard resetting link
> 10:22:49 ata5: softreset failed (device not ready)
> 10:22:49 ata5: hard resetting link
> 10:22:50 ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> 10:22:51 ata5.00: configured for UDMA/133
> 10:22:51 ata5: EH complete

... the difference being that a timeout which causes a super_written failure
seems to return an I/O error whereas the others don't:

ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
res 40/00:00:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout)
ata5.00: status: { DRDY }
ata5: hard resetting link
ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
ata5.00: configured for UDMA/133
ata5: EH complete
end_request: I/O error, dev sde, sector 1465147272
md: super_written gets error=-5, uptodate=0
raid10: Disk failure on sde3, disabling device.

I wonder what's different about these two timeouts such that one causes an I/O
error and the other just causes a retry after reset? Presumably if the latter
was also just a retry, everything would be (closer to being) fine.

Cheers,

Chris.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/