Re: True fsync() in Linux (on IDE)

From: Hans Reiser
Date: Fri Mar 19 2004 - 14:38:05 EST


Chris Mason wrote:

On Fri, 2004-03-19 at 03:05, Hans Reiser wrote:


Chris Mason wrote:



On Thu, 2004-03-18 at 16:09, Peter Zaitsev wrote:




On Thu, 2004-03-18 at 13:02, Chris Mason wrote:





In the former case cache is surely not flushed.





Hmmm, is it reiser? For both 2.4 reiserfs and ext3, the flush happens
when you commit. ext3 always commits on fsync and reiser only commits
when you've changed metadata.




Oh. Yes. This is Reiser, I did not think it is FS issue.
I'll know to stay away from ReiserFS now.




For reiserfs data=ordered should be enough to trigger the needed
commits. If not, data=journal. Note that neither fs does barriers for
O_SYNC, so we're just not perfect in 2.4.

-chris



You are not listening to Peter. As I understand it from what Peter says and your words, your implementation is wrong, and makes fsync meaningless. If so, then you need to fix it. fsync should not be meaningless even for metadata only journaling. This is a serious bug that needs immediate correction, if Peter and I understand it correctly from your words.



I am listening to Peter, Jens and I have spent a significant amount of
time on this code.

but you need to get it right.

We can go back and spend many more hours testing and
debugging the 2.4 changes, or we can go forward with a very nice
solution in 2.6.

I'm planning on going forward with 2.6


This is a very important patch that you have created, but you haven't articulated what happens in the following scenario (Peter I am making up something without knowing your internals, please feel encouraged to help me on this).

mysql fsync()'s a file, which it thinks guarantees that all of a mysql transaction has reached disk. The disk write caches it. You let fsync return. It is not on disk. mysql performs its mysql commit, and writes a mysql commit record which reaches disk, but not all of the transaction is on disk. The system crashes. mysql plays the log. mysql has internal corruption. User calls Peter. Peter asks, what do you expect when you use a piece of shit like reiserfs? User doesn't care about our internal squabbling and goes back to using windows which does proper commits.

Or, random application fsyncs, expects that it means that data has reached disk, and tells user to perform real world actions dependent on the data being on disk, but it is not.

I hope I am totally off-base and not understanding you.... Please help me here.

-chris


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/






--
Hans

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/