Re: [PATCH 4/4] hfsplus: get rid of write_super

From: Andrew Morton
Date: Thu Jun 21 2012 - 15:41:55 EST


On Wed, 13 Jun 2012 14:37:51 +0300
Artem Bityutskiy <dedekind1@xxxxxxxxx> wrote:

> From: Artem Bityutskiy <artem.bityutskiy@xxxxxxxxxxxxxxx>
>
> This patch makes hfsplus stop using the VFS '->write_super()' method along with
> the 's_dirt' superblock flag, because they are on their way out.
>
> The whole "superblock write-out" VFS infrastructure is served by the
> 'sync_supers()' kernel thread, which wakes up every 5 (by default) seconds and
> writes out all dirty superblocks using the '->write_super()' call-back. But the
> problem with this thread is that it wastes power by waking up the system every
> 5 seconds, even if there are no diry superblocks, or there are no client
> file-systems which would need this (e.g., btrfs does not use
> '->write_super()'). So we want to kill it completely and thus, we need to make
> file-systems to stop using the '->write_super()' VFS service, and then remove
> it together with the kernel thread.
>
> Tested using fsstress from the LTP project.
>
>
> ...
>
> --- a/fs/hfsplus/hfsplus_fs.h
> +++ b/fs/hfsplus/hfsplus_fs.h
> @@ -153,8 +153,11 @@ struct hfsplus_sb_info {
> gid_t gid;
>
> int part, session;
> -
> unsigned long flags;
> +
> + int work_queued; /* non-zero delayed work is queued */

This would be a little nicer if it had the bool type.

> + struct delayed_work sync_work; /* FS sync delayed work */
> + spinlock_t work_lock; /* protects sync_work and work_queued */

I'm not sure that this lock really needs to exist.

> -static void hfsplus_write_super(struct super_block *sb)
> +static void delayed_sync_fs(struct work_struct *work)
> {
> - if (!(sb->s_flags & MS_RDONLY))
> - hfsplus_sync_fs(sb, 1);
> - else
> - sb->s_dirt = 0;
> + struct hfsplus_sb_info *sbi;
> +
> + sbi = container_of(work, struct hfsplus_sb_info, sync_work.work);
> +
> + spin_lock(&sbi->work_lock);
> + sbi->work_queued = 0;
> + spin_unlock(&sbi->work_lock);

Here it is "protecting" a single write.

> + hfsplus_sync_fs(sbi->alloc_file->i_sb, 1);
> +}
> +
> +void hfsplus_mark_mdb_dirty(struct super_block *sb)
> +{
> + struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
> + unsigned long delay;
> +
> + if (sb->s_flags & MS_RDONLY)
> + return;
> +
> + spin_lock(&sbi->work_lock);
> + if (!sbi->work_queued) {
> + delay = msecs_to_jiffies(dirty_writeback_interval * 10);
> + queue_delayed_work(system_long_wq, &sbi->sync_work, delay);
> + sbi->work_queued = 1;
> + }
> + spin_unlock(&sbi->work_lock);
> }

And I think it could be made to go away here, perhaps by switching to
test_and_set_bit or similar.


And I wonder about the queue_delayed_work(). iirc this does nothing to
align timer expiries, so someone who has a lot of filesystems could end
up with *more* timer wakeups. Shouldn't we do something here to make
the system do larger amounts of work per timer expiry? Such as the
timer-slack infrastructure?


It strikes me that this whole approach improves the small system with
little write activity, but makes things worse for the larger system
with a lot of filesystems?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/