On Tue, Oct 15, 2002 at 11:36:08AM +0200, Jens Axboe wrote:
> (btw, do you have a complete patch against a recent kernel?)
Split patches are in here and were built against whatever bk pulled
yesterday.
http://people.sistina.com/~thornber/patches/2.5-unstable
> Bouncing has just never been used with a cloned bio, so there might be a
> corner case or two that needs to be fixed up.
OK, in that case I think we have found one of those corner cases.
Please review the BUG_ON at the top of blk_queue_bounce.
> But walk me through your
> request handling, please. It seems you are always allocating a
> clone_info and bio clone for io?
Before I dive into what dm is doing now I'll describe a couple of
changes that I feel need to be made to the block layer.
Flushing mapped io
------------------
Whever a mapping driver changes the mapping of a device that is in use
it should ensure that any io already mapped with the previous
mapping/table has completed. At the moment I do this by hooking the
bi_endio fn and incrementing an pending count (see
dm_suspend/dm_resume). Your new merge_bvec_fn (which I like) means
that the request has effectively been partially remapped *before* it
gets into the request function. ie. the request is now mapping
specific. So before I can use merge_bvec_fn we need to move the
suspend/resume functionality up into ll_rw_block (eg, move the pending
count into the queue ?). *Every* driver that dynamically changes a
mapping needs this functionality, LVM1 ignores it but is so simple it
*might* get away with it, dm certainly needs it, and I presume EVMS
has similar code in their application.
Agree ?
Splitting pages
---------------
ATM the merge_bvec_fn is a predicate which states whether a page
_can't_ be appended to a bio (a curious way round to do it). The
driver will always have to contain code that copes with the case when
a mapping boundary occurs within a page (this happens). I would
really like to go the whole hog and change the merge_fn so that it
returns how much of the page can be accepted. As far as I can tell
(you're probably about to put me right), the only reason you haven't
done this is because:
/*
* I/O completion handler for multipage BIOs.
*
* The mpage code never puts partial pages into a BIO (except for end-of-file).
* If a page does not map to a contiguous run of blocks then it simply falls
* back to block_read_full_page().
*
* Why is this? If a page's completion depends on a number of different BIOs
* which can complete in any order (or at the same time) then determining the
* status of that page is hard. See end_buffer_async_read() for the details.
* There is no point in duplicating all that complexity.
*/
I believe it _is_ possible to implement this without incurring
significant CPU overhead or extra memory usage in the common case that
the mapping boundaries do occur on page breaks (This was another
reason why I was trying to get the users of bio_add_page using a
slightly higher level api that takes the bio submitting out of their
hands).
BTW: is there any point calling the merge_bvec_fn in bio_add_page when
the bios length is zero ?
dm splitting now
----------------
The splitting code as it exists ATM is ignoring the merge_bvec_fn.
I'm really not worried about performance yet, that said bonnie++
results are very similar on a raw partition or a dm mapping.
> request handling, please. It seems you are always allocating a
> clone_info and bio clone for io?
The clone_info is a little struct that sits on the stack, keeping
track of the splitting progress. It is not dynamically allocated.
static void __split_bio(struct mapped_device *md, struct bio *bio)
{
struct clone_info ci;
...
while (ci.sector_count)
__clone_and_map(&ci);
If you look at the 2.4 version of dm you will see I'm using a struct
io_hook to keep track of mapped bios in order to maintain the pending
count for dm_suspend/dm_resume. Since I will often need to allocate a
bio_clone anyway I decided to do away with the struct io_hook, and
always allocate the clone.
I will be very happy when the above two changes have been made to the
block layer and the request function can become a simple, single
remapping. No splitting or memory allocations will then be
neccessary.
- Joe
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Tue Oct 15 2002 - 22:00:54 EST