Re: [RFC] page-writeback: move indoes from one superblock together

From: Wu Fengguang
Date: Thu Sep 24 2009 - 06:02:32 EST


On Thu, Sep 24, 2009 at 02:54:20PM +0800, Li, Shaohua wrote:
> __mark_inode_dirty adds inode to wb dirty list in random order. If a disk has
> several partitions, writeback might keep spindle moving between partitions.
> To reduce the move, better write big chunk of one partition and then move to
> another. Inodes from one fs usually are in one partion, so idealy move indoes
> from one fs together should reduce spindle move. This patch tries to address
> this. Before per-bdi writeback is added, the behavior is write indoes
> from one fs first and then another, so the patch restores previous behavior.
> The loop in the patch is a bit ugly, should we add a dirty list for each
> superblock in bdi_writeback?
>
> Test in a two partition disk with attached fio script shows about 3% ~ 6%
> improvement.

A side note: given the noticeable performance gain, I wonder if it
deserves to generalize the idea to do whole disk location ordered
writeback. That should benefit many small file workloads more than
10%. Because this patch only sorted 2 partitions and inodes in 5s
time window, while the below patch will roughly divide the disk into
5 areas and sort inodes in a larger 25s time window.

http://lkml.org/lkml/2007/8/27/45

Judging from this old patch, the complexity cost would be about 250
lines of code (need a rbtree).

Thanks,
Fengguang

> Signed-off-by: Shaohua Li <shaohua.li@xxxxxxxxx>
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 8e1e5e1..fc87730 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -324,13 +324,29 @@ static void move_expired_inodes(struct list_head *delaying_queue,
> struct list_head *dispatch_queue,
> unsigned long *older_than_this)
> {
> + LIST_HEAD(tmp);
> + struct list_head *pos, *node;
> + struct super_block *sb;
> + struct inode *inode;
> +
> while (!list_empty(delaying_queue)) {
> - struct inode *inode = list_entry(delaying_queue->prev,
> - struct inode, i_list);
> + inode = list_entry(delaying_queue->prev, struct inode, i_list);
> if (older_than_this &&
> inode_dirtied_after(inode, *older_than_this))
> break;
> - list_move(&inode->i_list, dispatch_queue);
> + list_move(&inode->i_list, &tmp);
> + }
> +
> + /* Move indoes from one superblock together */
> + while (!list_empty(&tmp)) {
> + inode = list_entry(tmp.prev, struct inode, i_list);
> + sb = inode->i_sb;
> + list_for_each_prev_safe(pos, node, &tmp) {
> + struct inode *inode = list_entry(pos,
> + struct inode, i_list);
> + if (inode->i_sb == sb)
> + list_move(&inode->i_list, dispatch_queue);
> + }
> }
> }
>
>

Content-Description: newfio
> [global]
> runtime=120
> ioscheduler=cfq
> size=2G
> ioengine=sync
> rw=write
> file_service_type=random:256
> overwrite=1
>
> [sdb1]
> directory=/mnt/b1
> nrfiles=10
> numjobs=4
>
> [sdb2]
> directory=/mnt/b2
> nrfiles=10
> numjobs=4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/