Re: linux-next: manual merge of the block tree with the tree

From: Zach Brown
Date: Fri Nov 08 2013 - 12:56:49 EST


> > > That make sense? I can show you more concretely what I'm working on if
> > > you want. Or if I'm full of crap and this is useless for what you guys
> > > want I'm sure you'll let me know :)
> >
> > It sounds interesting, but also a little confusing at this point, at
> > least from the non-block side of view.
>
> Zach, you want to chime in? He was involved in the discussion yesterday,
> he might be able to explain this stuff better than I.

I can try. I may not do the *best* job because I've been on the
periphery of most of this since I left the proof of concept back at
Oracle :).

The first part is passing in pages instead of mapped addresses. That's
where the iov_iter argument came from. A ham-fisted proof of concept to
try to abstract iterating over any old type of memory. But it's not
*really* abstract because dio magically knows (look for gross
iov_iter_has_iovec() callers) whether the memory is in iovecs or bio
pages when its verifying alignment, pinning or not, etc. In the end
it's little more than syntactic sugar to try and pretend that two
interfaces are one.

For expedience, this iov_iter approach used the loop's bio to store the
pages in the iov_iter rather than translating the bio's pages to a page
array in the iov_iter.

So the first part of what I think Kent is picturing is to take that to
its logical conclusion and have the caller describe the io memory and
offset with a bio instead of explicit address and offset arguments.
This way dio can do nice bio management operations to kick off its
device bios rather than having to clumsily build them from either
incoming pages or mapped user addresses that are hidden in iov_iter.

I'm imagining cutting the current dio up in to two phases. One that
pins user pages and puts them in bios and one that maps those file bios
to device bios and submits them. Then the fop method becomes the second
phase so that loop can call it with its file bios. Call it
->submit_file_bio() instead of ->do_direct_IO(), maybe?

The other part of this series that isn't getting as much attention,
though, is async submission and completion. This patch introduces a
weird in-kernel aio submission interface that adds special cases to aio.
In this new bio world order we could get rid of that complication by
relying on the bio's ->bi_end_io() for completion.

I suppose a high level view of this strategy is to move more towards a
stack where layers have matching inputs and outputs. If both dio and
loop take bios as input and translate them into submitted output bios
then the stacking becomes more natural.

That's the blue sky fantasy anyway. There's a lot of detail being
glossed over. I want to see what the patches look like.

- z
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/