Re: [PATCH] avoid too many boundaries in DIO

From: Chris Mason
Date: Fri Nov 10 2006 - 16:50:10 EST


On Thu, Nov 09, 2006 at 10:56:18PM -0800, Andrew Morton wrote:
> > This patch just clears the boundary bit after using it once. It is 10%
> > faster for a streaming DIO write w/blocksize of 512k on my sata drive.
> >
>
> Thanks.
>
> But that's two large performance regressions (so far) from the multi-block
> get_block() feature. And that was allegedly a performance optimisation!
> Who's testing this stuff?

Well, I've been wearing the sata-drive-writeback-cache cape of shame
since Dave found the regression while testing my stuff, so I can't
really point fingers. On some drives the regression didn't show up, but
it should be there on any beefy storage.

> Is that actually correct? If ->get_block() returned a
> buffer_boundary() buffer then what we want to do is to push down all
> the thus-far-queued BIOs once we've submitted _all_ of the BIOs
> represented by map_bh. I think that if we require more than one BIO
> to cover map_bh.b_size then we'll do the submission after the first
> BIO has been sent instead of after the final one has been sent?
>
I realized the same thing this morning, but it took a while to figure
out why honoring the boundary on the last block was 5% slower than my
first patch. It turns out that we consistently send down the boundary
bio too soon.

Testing of this has been very light, but I wanted to get it out for
review. I'll test more over the weekend.

-chris

From: Chris Mason <chris.mason@xxxxxxxxxx>
Subject: avoid too many boundaries in DIO

Dave Chinner found a 10% performance regression with ext3 when using DIO
to fill holes instead of buffered IO. On large IOs, the ext3 get_block
routine will send more than a page worth of blocks back to DIO via a
single buffer_head with a large b_size value.

The DIO code iterates through this massive block and tests for a
boundary buffer over and over again. For every block size unit spanned
by the big map_bh, the boundary bit is tested and a bio may be forced
down to the block layer.

This patch changes things to only submit the boundary bio for the
last block in the big map_bh.

The DIO code had a number of places that would honor dio->boundary
too early, sending the bio down before actually adding the boundary
block to it. Those are also fixed.

Signed-off-by: Chris Mason <chris.mason@xxxxxxxxxx>

diff -r 18a9e9f5c707 fs/direct-io.c
--- a/fs/direct-io.c Thu Oct 19 08:30:00 2006 +0700
+++ b/fs/direct-io.c Fri Nov 10 16:33:04 2006 -0500
@@ -572,7 +571,6 @@ static int dio_new_bio(struct dio *dio,
nr_pages = min(dio->pages_in_io, bio_get_nr_vecs(dio->map_bh.b_bdev));
BUG_ON(nr_pages <= 0);
ret = dio_bio_alloc(dio, dio->map_bh.b_bdev, sector, nr_pages);
- dio->boundary = 0;
out:
return ret;
}
@@ -626,12 +624,6 @@ static int dio_send_cur_page(struct dio
*/
if (dio->final_block_in_bio != dio->cur_page_block)
dio_bio_submit(dio);
- /*
- * Submit now if the underlying fs is about to perform a
- * metadata read
- */
- if (dio->boundary)
- dio_bio_submit(dio);
}

if (dio->bio == NULL) {
@@ -648,6 +640,12 @@ static int dio_send_cur_page(struct dio
BUG_ON(ret != 0);
}
}
+ /*
+ * Submit now if the underlying fs is about to perform a
+ * metadata read
+ */
+ if (dio->boundary)
+ dio_bio_submit(dio);
out:
return ret;
}
@@ -674,6 +672,10 @@ submit_page_section(struct dio *dio, str
unsigned offset, unsigned len, sector_t blocknr)
{
int ret = 0;
+ int boundary = dio->boundary;
+
+ /* don't let dio_send_cur_page do the boundary too soon */
+ dio->boundary = 0;

/*
* Can we just grow the current page's presence in the dio?
@@ -683,17 +683,7 @@ submit_page_section(struct dio *dio, str
(dio->cur_page_block +
(dio->cur_page_len >> dio->blkbits) == blocknr)) {
dio->cur_page_len += len;
-
- /*
- * If dio->boundary then we want to schedule the IO now to
- * avoid metadata seeks.
- */
- if (dio->boundary) {
- ret = dio_send_cur_page(dio);
- page_cache_release(dio->cur_page);
- dio->cur_page = NULL;
- }
- goto out;
+ goto out_send;
}

/*
@@ -712,6 +702,18 @@ submit_page_section(struct dio *dio, str
dio->cur_page_offset = offset;
dio->cur_page_len = len;
dio->cur_page_block = blocknr;
+
+out_send:
+ /*
+ * If dio->boundary then we want to schedule the IO now to
+ * avoid metadata seeks.
+ */
+ if (boundary) {
+ dio->boundary = 1;
+ ret = dio_send_cur_page(dio);
+ page_cache_release(dio->cur_page);
+ dio->cur_page = NULL;
+ }
out:
return ret;
}
@@ -917,7 +919,16 @@ do_holes:
this_chunk_bytes = this_chunk_blocks << blkbits;
BUG_ON(this_chunk_bytes == 0);

- dio->boundary = buffer_boundary(map_bh);
+ /*
+ * get_block may return more than one page worth
+ * of blocks. Make sure only the last io we
+ * send down for this region is a boundary
+ */
+ if (dio->blocks_available == this_chunk_blocks)
+ dio->boundary = buffer_boundary(map_bh);
+ else
+ dio->boundary = 0;
+
ret = submit_page_section(dio, page, offset_in_page,
this_chunk_bytes, dio->next_block_for_io);
if (ret) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/