Bug during media scan, kernel 2.4.23-pre9

From: Stephan von Krawczynski
Date: Mon Nov 03 2003 - 12:47:37 EST


I just encountered a real bad problem with using media scan on 3ware
controllers. I have 3 hds connected and configured a RAID5. I use media scan
regularly (daily basis). Since two days I see this problem:

Nov 3 18:12:11 box 3w-xxxx[2039]: INFORMATION: Verify started on unit 0 on
controller ID:2. (0x29)
Nov 3 18:19:41 box kernel: 3w-xxxx: scsi2: Unit #0: Command (f6e5d800) timed
out, resetting card.

After that the box has problems, the controller obviously hangs.
This in itself can be considered a bug, but what is really annoying is that one
has no chance finding out _which_ port caused the problem.
So at this point you can play roulette and replace one of the hds hoping that
it was indeed the bad one.

It would be really a lot better to degrade the unit in this case and give a
hint which port has problems (command timed out on port ...).
This would:
a) not hang the box
b) give you a chance to replace the hd, as you would expect in RAID5

The current situation is absolutely _no good_.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/