theOne thing that can happen is when we have a hot spot (like the super
block) on high capacity drives is that the frequent write degrade
avoiddata in adjacent tracks. Some drives have firmware that watches for
this and rewrites adjacent tracks, but it is also a good idea to
too frequent updates.
Yet another detail to worry about.... :-(
it never ends :-)
superblock,
Didn't you have a tunable to decrease this update frequency?
/sys/block/mdX/md/safe_mode_delay
is a time in seconds (Default 0.200) between when the last write to
the array completes and when the superblock is marked as clean.
Depending on the actual rate of writes to the array, the superblock
can be updated as much as twice in this time (once to mark dirty,
once to mark clean).
Increasing the number can decrease the update frequency of thebut the exact effect on update frequency is very load-dependant.
Obviously a write-intent-bitmap, which is rarely more that a few
sectors, can also see lots of updates, and it is harder to tune
that (you have to set things up when you create the bitmap).
NeilBrown
We did see issues in practice with adjacent sectors with some drives,
so this
one is worth tuning down.
I would suggest that Andrei might try to write and clear the IO error
at that
offset. You can use Mark Lord's hdparm to clear a specific sector or
just do the
math (carefully!) and dd over it. It the write succeeds (without
bumping your
remapped sectors count) this is a likely match to this problem,
I've tried dd multiple times, it always succeeds, and the relocated sector
count is currently 1 on this drive, even though this particular fault
happened at least 3 times so far.