Re: [PATCH] btrfs: lower metadata writeback threshold on low dirtythreshold
From: Fengguang Wu
Date: Thu May 03 2012 - 06:02:59 EST
On Thu, May 03, 2012 at 11:25:28AM +0200, Jan Kara wrote:
> On Thu 03-05-12 11:43:11, Wu Fengguang wrote:
> > This helps write performance when setting the dirty threshold to tiny numbers.
> >
> > 3.4.0-rc2 3.4.0-rc2-btrfs4+
> > ------------ ------------------------
> > 96.92 -0.4% 96.54 bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
> > 98.47 +0.0% 98.50 bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
> > 99.38 -0.3% 99.06 bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
> > 98.04 -0.0% 98.02 bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
> > 98.68 +0.3% 98.98 bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
> > 99.34 -0.0% 99.31 bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
> > ==> 88.98 +9.6% 97.53 bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
> > ==> 86.99 +13.1% 98.39 bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
> > ==> 2.75 +2442.4% 69.88 bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
> > ==> 3.31 +2634.1% 90.54 bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2
> >
> > Signed-off-by: Fengguang Wu <fengguang.wu@xxxxxxxxx>
> > ---
> > fs/btrfs/disk-io.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > --- linux-next.orig/fs/btrfs/disk-io.c 2012-05-02 14:04:00.989262395 +0800
> > +++ linux-next/fs/btrfs/disk-io.c 2012-05-02 14:04:01.773262414 +0800
> > @@ -930,7 +930,8 @@ static int btree_writepages(struct addre
> >
> > /* this is a bit racy, but that's ok */
> > num_dirty = root->fs_info->dirty_metadata_bytes;
> > - if (num_dirty < thresh)
> > + if (num_dirty < min(thresh,
> > + global_dirty_limit << (PAGE_CACHE_SHIFT-2)))
> > return 0;
> > }
> > return btree_write_cache_pages(mapping, wbc);
> Frankly, that whole condition on WB_SYNC_NONE in btree_writepages() looks
> like a hack. I think we also had problems with this condition when we tried
> to change b_more_io list handling. I found rather terse commit message
> explaining the code:
> Btrfs: Limit btree writeback to prevent seeks
>
> Which I kind of understand but is it that bad? Also I think last time we
> stumbled over this code we were discussing that these dirty metadata would
> be simply hidden from mm which would solve the problem of flusher thread
> trying to outsmart the filesystem... But I guess noone had time to
> implement this for btrfs.
Yeah I have the same uneasy feelings. Actually my first attempt was to
remove the heuristics in btree_writepages() altogether. The result is
more or less performance degradations in the normal cases:
wfg@bee /export/writeback% ./compare bay/*/*-{3.4.0-rc2,3.4.0-rc2-btrfs+}
3.4.0-rc2 3.4.0-rc2-btrfs+
------------------------ ------------------------
190.81 -6.8% 177.82 bay/JBOD-2HDD-thresh=1000M/btrfs-100dd-1-3.4.0-rc2
195.86 -3.3% 189.31 bay/JBOD-2HDD-thresh=1000M/btrfs-10dd-1-3.4.0-rc2
196.68 -1.7% 193.30 bay/JBOD-2HDD-thresh=1000M/btrfs-1dd-1-3.4.0-rc2
194.83 -24.4% 147.27 bay/JBOD-2HDD-thresh=100M/btrfs-100dd-1-3.4.0-rc2
196.60 -2.5% 191.61 bay/JBOD-2HDD-thresh=100M/btrfs-10dd-1-3.4.0-rc2
197.09 -0.7% 195.69 bay/JBOD-2HDD-thresh=100M/btrfs-1dd-1-3.4.0-rc2
181.64 -8.7% 165.80 bay/RAID0-2HDD-thresh=1000M/btrfs-100dd-1-3.4.0-rc2
186.14 -2.8% 180.85 bay/RAID0-2HDD-thresh=1000M/btrfs-10dd-1-3.4.0-rc2
191.10 -1.5% 188.23 bay/RAID0-2HDD-thresh=1000M/btrfs-1dd-1-3.4.0-rc2
191.30 -20.7% 151.63 bay/RAID0-2HDD-thresh=100M/btrfs-100dd-1-3.4.0-rc2
186.03 -2.4% 181.54 bay/RAID0-2HDD-thresh=100M/btrfs-10dd-1-3.4.0-rc2
170.18 -2.5% 165.97 bay/RAID0-2HDD-thresh=100M/btrfs-1dd-1-3.4.0-rc2
96.18 -1.9% 94.32 bay/RAID1-2HDD-thresh=1000M/btrfs-100dd-1-3.4.0-rc2
97.71 -1.4% 96.36 bay/RAID1-2HDD-thresh=1000M/btrfs-10dd-1-3.4.0-rc2
97.57 -0.4% 97.23 bay/RAID1-2HDD-thresh=1000M/btrfs-1dd-1-3.4.0-rc2
97.68 -6.0% 91.79 bay/RAID1-2HDD-thresh=100M/btrfs-100dd-1-3.4.0-rc2
97.76 -0.7% 97.07 bay/RAID1-2HDD-thresh=100M/btrfs-10dd-1-3.4.0-rc2
97.53 -0.3% 97.19 bay/RAID1-2HDD-thresh=100M/btrfs-1dd-1-3.4.0-rc2
96.92 -3.0% 94.03 bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
98.47 -1.4% 97.08 bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
99.38 -0.7% 98.66 bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
98.04 -8.2% 89.99 bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
98.68 -0.6% 98.09 bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
99.34 -0.7% 98.62 bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
88.98 -0.5% 88.51 bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
86.99 +14.5% 99.60 bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
2.75 +1871.2% 54.18 bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
3.31 +2035.0% 70.70 bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2
3635.55 -1.2% 3592.46 TOTAL write_bw
So I end up with the conservative fix in this patch.
FYI I also experimented with "global_dirty_limit << PAGE_CACHE_SHIFT"
w/o the further "/4" in this patch, however result is not good:
3.4.0-rc2 3.4.0-rc2-btrfs3+
------------------------ ------------------------
96.92 -0.3% 96.62 bay/thresh=1000M/btrfs-100dd-1-3.4.0-rc2
98.47 +0.1% 98.56 bay/thresh=1000M/btrfs-10dd-1-3.4.0-rc2
99.38 -0.2% 99.23 bay/thresh=1000M/btrfs-1dd-1-3.4.0-rc2
98.04 +0.1% 98.15 bay/thresh=100M/btrfs-100dd-1-3.4.0-rc2
98.68 +0.3% 98.96 bay/thresh=100M/btrfs-10dd-1-3.4.0-rc2
99.34 -0.1% 99.20 bay/thresh=100M/btrfs-1dd-1-3.4.0-rc2
88.98 -0.3% 88.73 bay/thresh=10M/btrfs-10dd-1-3.4.0-rc2
86.99 +1.4% 88.23 bay/thresh=10M/btrfs-1dd-1-3.4.0-rc2
2.75 +232.0% 9.13 bay/thresh=1M/btrfs-10dd-1-3.4.0-rc2
3.31 +1.5% 3.36 bay/thresh=1M/btrfs-1dd-1-3.4.0-rc2
So this patch is kind of based on "experiment" rather than "reasoning".
And I took the easy way of using the global dirty threshold. Ideally
it should be based upon the per-bdi dirty threshold, but anyway...
Thanks,
Fengguang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/