Re: MD/RAID: what's wrong with sector 1953519935?

From: Ric Wheeler
Date: Tue Aug 25 2009 - 22:42:16 EST


On 08/25/2009 10:22 PM, Andrei Tanas wrote:
One thing that can happen is when we have a hot spot (like the super
block) on high capacity drives is that the frequent write degrade
the
data in adjacent tracks. Some drives have firmware that watches for
this and rewrites adjacent tracks, but it is also a good idea to
avoid
too frequent updates.

Yet another detail to worry about.... :-(

it never ends :-)



Didn't you have a tunable to decrease this update frequency?

/sys/block/mdX/md/safe_mode_delay
is a time in seconds (Default 0.200) between when the last write to
the array completes and when the superblock is marked as clean.
Depending on the actual rate of writes to the array, the superblock
can be updated as much as twice in this time (once to mark dirty,
once to mark clean).

Increasing the number can decrease the update frequency of the
superblock,
but the exact effect on update frequency is very load-dependant.

Obviously a write-intent-bitmap, which is rarely more that a few
sectors, can also see lots of updates, and it is harder to tune
that (you have to set things up when you create the bitmap).

NeilBrown


We did see issues in practice with adjacent sectors with some drives,
so this
one is worth tuning down.

I would suggest that Andrei might try to write and clear the IO error
at that
offset. You can use Mark Lord's hdparm to clear a specific sector or
just do the
math (carefully!) and dd over it. It the write succeeds (without
bumping your
remapped sectors count) this is a likely match to this problem,

I've tried dd multiple times, it always succeeds, and the relocated sector
count is currently 1 on this drive, even though this particular fault
happened at least 3 times so far.


I would bump that count way up (say to 2) and see if you have an issue...

ric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/