Re: How to know when file data has been flushed into disk?
From: Xin Zhao
Date: Fri Apr 07 2006 - 21:40:16 EST
This answered all my questions! Many thanks! Will check the phase 2 code.
Xin
On 4/7/06, Zach Brown <zab@xxxxxxxxx> wrote:
>
> > If a program access data like this:
> >
> > 1. open the file
> > 2. write a lot of data into this file
>
> You don't say if this is an extending write or overwriting existing file
> data. I'm going to assume extending writes so that data=ordered kicks in.
>
> > 3. close the file
>
> > So my questions are:
> > 1. How will the file system be notified after all data has been
> > flushed into disk?
>
> Look at phase 2 in journal_commit_transaction(). The kjournald thread
> issues the writeback of the file data by walking t_sync_datalist and
> then waits for the writeback to complete by using wait_on_buffer()
> before committing the transaction.
>
> > 2. Unlike data=journal mode, in data=order mode, the data could be
> > lost if system crashes when data is being flushed to disk. When system
> > reboots, does journal contains the old meta data for undo?
>
> No, ext3 isn't roll-backward. It doesn't store the *old* data in the
> journal and undo the change if it fails halfway through. It's
> roll-forward. It stores the *new* data in the journal and replays
> complete transactions in the journal that weren't moved out to their
> final place on disk at the time of the crash.
>
> So if the machine reboots during the writeback phase then the
> transaction won't be committed yet and recovery won't replay that
> transaction from the journal. From the metadata's point of view the
> file extension will never have happened.
>
> > 3. Does sys_close() have to be blocked until all data and metadata
> > are committed?
>
> No, and neither does sys_getpid() :)
>
> > to take subsequent operation. However, data flush could be failed. In
> > this case, file system seems to mislead the application. Is this true?
>
> No. The application has no grounds for assuming that a successful
> close() has synced previous operations to disk. It's simply not part of
> the API.
>
> > If so, any solutions?
>
> The application should rely on tools like fsync(), fdatasync(), O_SYNC,
> mount -o sync, etc. Whatever suits it best.
>
> - z
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/