Re: Data corruption on software RAID

From: Mikulas Patocka
Date: Thu Apr 10 2008 - 22:55:45 EST


> Currently you can go for hours without ever reaching a clean state on active
> files. By not deliberately allowing the buffer to change during a write the
> chances for getting consistent data on the disk should be significantly
> improved.

It can already happen that one device writes the sector and other not if
the power is interrupted. And all RAID implementations already deal with
it by resynchronizing the modified areas in case of crash. So they could
resynchronize modify-while-write cases as well, with the same code.

... or I don't know if MM maintainers want to add locking to the pages
that are under a write. Personally, I wouldn't do it.

> > From my point of view that trick with thread doing sync() and turning off
> > region bits looks best. I'd like to know if that solution doesn't have any
> > other flaw.
> >
> >
> > > For reliable operation I would want all copies (and/or CRCs) to be
> > > written on an fsync, by the time I bother to fsync I really, really,
> > > want the data on the disk.
> > >
> >
> > fsync already works this way.
> >
>
> The point I was making is that after you change the code I would still want
> that to happen. And your comment above seems to indicate a goal of getting
> consistent data after a crash, with less concern that it be the most recent
> data written. Sorry in advance if that's a misreading of "you just must not
> forget to resync them after a crash."

There would be no problem with fsync. Fsync writes the synced data to both
devices. So after a crash you can select any of the devices as a resync
master copy, and you get the data that you wrote before sync() or fsync().

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/