fsync/fdatasync with Ext3 data=journal

From: Seiji Kihara
Date: Wed Aug 11 2004 - 00:28:23 EST


Hello,

We found that fsync and fdatasync syscalls sometimes don't sync
data in an ext3 file system under the following conditions.

1. Kernel version is 2.6.6 or later (including 2.6.8-rc4 and -mm1).
2. Ext3's journalling mode is "data=journal".
3. Create a file (whose size is 1Mbytes) and execute umount/mount.
4. lseek to a random position within the file, write 8192 bytes
data, and fsync or fdatasync.

We presume the data was not written to the corresponding disk
before returning from fsync or fdatasync syscall on the evidence
as follows:

1. The response time of fsync() and fdatasync() was extremely
short (under 10 microseconds).
2. The IDE writing routine ide_start_dma() was not called under
DMA enabled.

The problem was solved by modifying ext3_sync_file() as follows.
Maybe, i_state is not correctly set to I_DIRTY when the related
page cache is dirty. Is it true?

diff -Nru a/fs/ext3/fsync.c b/fs/ext3/fsync.c
--- a/fs/ext3/fsync.c 2004-05-10 11:33:20.000000000 +0900
+++ b/fs/ext3/fsync.c 2004-08-09 14:29:00.000000000 +0900
@@ -49,9 +49,6 @@

J_ASSERT(ext3_journal_current_handle() == 0);

- smp_mb(); /* prepare for lockless i_state read */
- if (!(inode->i_state & I_DIRTY))
- goto out;

/*
* data=writeback:

Hardware environment:
CPU: Intel Pentium 4 3GHz Hyper Threading, 2nd cache 512KB
RAM: 1GBytes
IDE controller: on-board ICH5 (ATA100)
HDD: E-IDE (7200rpm, 8MB cache, ATA133) x 2
(one for system and the other for writing test data)

Regards,

Seiji

--
Seiji Kihara
Open Source Software Computing Project,
NTT Cyber Space Laboratories
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/