Re: [PATCH] issue storage device flush via sync_blockdev() (was Re:Linux 2.6.29)

From: Jeff Garzik
Date: Wed Mar 25 2009 - 19:09:02 EST


Eric Sandeen wrote:
Jeff Garzik wrote:
On Wed, Mar 25, 2009 at 01:40:37PM -0700, Linus Torvalds wrote:
On Wed, 25 Mar 2009, Jeff Garzik wrote:
It is clearly possible to implement an fsync(2) that causes FLUSH CACHE to be
issued, without adding full barrier support to a filesystem. It is likely
doable to avoid touching per-filesystem code at all, if we issue the flush
from a generic fsync(2) code path in the kernel.
We could easily do that. It would even work for most cases. The problematic ones are where filesystems do their own disk management, but I guess those people can do their own fsync() management too.

Somebody send me the patch, we can try it out.
This is a simple step that would cover a lot of cases... sync(2)
calls sync_blockdev(), and many filesystems do as well via the generic
filesystem helper file_fsync (fs/sync.c).

XFS code calls sync_blockdev() a "big hammer", so I hope my patch
follows with known practice.

Looking over every use of sync_blockdev(), its most frequent use is
through fsync(2), for the selected filesystems that use the generic
file_fsync helper.

Most callers of sync_blockdev() in the kernel do so infrequently,
when removing and invalidating volumes (MD) or storing the superblock
prior to release (put_super) in some filesystems.

Compile-tested only, of course :) But it should be work :)

My main concern is some hidden area that calls sync_blockdev() with
a high-enough frequency that the performance hit is bad.

Signed-off-by: Jeff Garzik <jgarzik@xxxxxxxxxx>

diff --git a/fs/buffer.c b/fs/buffer.c
index 891e1c7..7b9f74a 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -173,9 +173,14 @@ int sync_blockdev(struct block_device *bdev)
{
int ret = 0;
- if (bdev)
- ret = filemap_write_and_wait(bdev->bd_inode->i_mapping);
- return ret;
+ if (!bdev)
+ return 0;
+
+ ret = filemap_write_and_wait(bdev->bd_inode->i_mapping);
+ if (ret)
+ return ret;
+
+ return blkdev_issue_flush(bdev, NULL);
}
EXPORT_SYMBOL(sync_blockdev);

What about when you're running over a big raid device with
battery-backed cache, and you trust the cache as much as much as the
disks. Wouldn't this unconditional cache flush be painful there on any
of the callers even if they're rare? (fs unmounts, freezes, unmounts,
etc? Or a fat filesystem on that device doing an fsync?)

What exactly do you think sync_blockdev() does? :)

It is used right before a volume goes away. If that's not a time to flush the cache, I dunno what is.

The _whole purpose_ of sync_blockdev() is to push out the data to permanent storage. Look at the users -- unmount volume, journal close, etc. Things that are OK to occur after those points include: power off, device unplug, etc.

A secondary purpose of sync_blockdev() is as a hack, for simple/ancient bdev-based filesystems that do not wish to bother with barriers and all the associated complexity to tracking what writes do/do not need flushing.


xfs, reiserfs, ext4 all avoid the blkdev flush on fsync if barriers are
not enabled, I think for that reason...

Enabling barriers causes slowdowns far greater than that of simply causing fsync(2) to trigger FLUSH CACHE, because barriers imply FLUSH CACHE issuance for all in-kernel filesystem journalled/atomic transactions, in addition to whatever syscalls userspace is issuing.

The number of FLUSH CACHES w/ barriers is orders of magnitude larger than the number of fsync/fdatasync calls.

Jeff



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/