Re: [EXPERIMENT] Add new "flush" option
From: Andrew Morton
Date: Sun Dec 25 2005 - 07:21:57 EST
OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx> wrote:
>
> This adds new "flush" option for hotplug devices.
>
> Current implementation of "flush" option does,
>
> - synchronizing data pages at ->release() (last close(2))
> - if user's work seems to be done (fs is not active), all
> metadata syncs by pdflush()
>
> This option would provide kind of sane progress, and dirty buffers is
> flushed more frequently (if fs is not active). This option doesn't
> provide any robustness (robustness is provided by other options), but
> probably the option is proper for hotplug devices.
>
> (Please don't assume that dirty buffers is synchronized at any point.
> This implementation will be changed easily.)
>
> ...
>
> +
> +#define FLUSH_INITAL_DELAY HZ
> +#define FLUSH_DELAY (HZ / 2)
> +
> +int fs_flush_sync_fdata(struct inode *inode, struct file *filp)
> +{
> + int err = 0;
> +
> + if (IS_FLUSH(inode) && filp->f_mode & FMODE_WRITE) {
> + current->flags |= PF_SYNCWRITE;
> + err = filemap_write_and_wait(inode->i_mapping);
> + current->flags &= ~PF_SYNCWRITE;
> + }
> + return err;
> +}
You can use filp->f_mapping here, remove the inode* argument.
> +EXPORT_SYMBOL(fs_flush_sync_fdata);
> +
> +static void fs_flush_pdflush_handler(unsigned long arg)
> +{
> + struct super_block *sb = (struct super_block *)arg;
> + fsync_super(sb);
> + up_read(&sb->s_umount);
> +}
> +
> +static void fs_flush_timer(unsigned long data)
> +{
> + struct super_block *sb = (struct super_block *)data;
> + struct backing_dev_info *bdi = blk_get_backing_dev_info(sb->s_bdev);
> + unsigned long last_flush_jiff;
> + int err;
> +
> + if (bdi_write_congested(bdi)) {
> + mod_timer(&sb->flush_timer, jiffies + (HZ / 10));
> + return;
> + }
The bdi_write_congested() test probably isn't doing anything useful: it
only returns true if there's really heavy writeout in progress. Possibly
you could look at the disk queue accounting stats, work out how much I/O
has been happening lately.
> + last_flush_jiff = sb->last_flush_jiff;
> +
> + if (!time_after_eq(jiffies, last_flush_jiff + FLUSH_DELAY)) {
> + mod_timer(&sb->flush_timer, last_flush_jiff + FLUSH_DELAY);
> + return;
> + }
> +
> + if (down_read_trylock(&sb->s_umount)) {
> + if (sb->s_root) {
> + err = pdflush_operation(fs_flush_pdflush_handler, data);
> + if (!err)
> + return;
> + mod_timer(&sb->flush_timer, jiffies + FLUSH_DELAY);
> + }
> + up_read(&sb->s_umount);
> + }
> +}
>
> +void __fs_mark_flush(struct super_block *sb)
> +{
> + sb->last_flush_jiff = jiffies;
> + /*
> + * make sure by smb_wmb() that dirty buffers before here is
> + * processed at the timer routine.
> + */
> + smp_wmb();
> +
> + if (!timer_pending(&sb->flush_timer))
> + mod_timer(&sb->flush_timer, jiffies + FLUSH_INITAL_DELAY);
> +}
> +EXPORT_SYMBOL(__fs_mark_flush);
> +
> +/*
> + * caller must take down_write(sb->s_umount), otherwise pdflush
> + * handler may be run after this del_timer_sync.
> + */
> +void fs_flush_stop(struct super_block *sb)
> +{
> + sb->s_flags &= ~MS_FLUSH;
> + del_timer_sync(&sb->flush_timer);
> +}
> +
> +void fs_flush_init(struct super_block *sb)
> +{
> + init_timer(&sb->flush_timer);
> + sb->flush_timer.data = (unsigned long)sb;
> + sb->flush_timer.function = fs_flush_timer;
> + sb->last_flush_jiff = 0;
> +}
The superblock lifetime handling all looks OK to me.
However I wonder if all this code would become simpler if we were to just
tweak writeback_inodes() a bit: if the superblock was mounted with MS_FLUSH
then temporarily set wbc->sync_mode to WB_SYNC_ALL.
If we do that, the regular wb_kupdate() will sync all the IS_FLUSH
filesystems for us at dirty_writeback_centisecs intervals. That's a bit
less flexible, but probably less code.
(Alternatively, add a new writeback_control.ms_flush_only boolean. Set
that, then reuse some of the code in mm/page-writeback.c in some manner).
You abandoned the flush-file-in-release() idea? That seemed faily neat.
What was the thinking here?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/