On Thu, Apr 26, 2007 at 05:48:12PM +1000, Nick Piggin wrote:
Christoph Lameter wrote:
On Thu, 26 Apr 2007, Nick Piggin wrote:
No I don't want to add another fs layer.
Well maybe you could explain what you want. Preferably without redefining the established terms?
Support for larger buffers than page cache pages.
The problem with this approach is that it turns around the whole
way we look at bufferheads. Right now we have well defined 1:n
mapping of page to bufferheads and so we tpyically lock the
page first them iterate all the bufferheads on the page.
Going the other way, we need to support m:n which we means
the buffer has to become the primary interface for the filesystem
to the page cache. i.e. we need to lock the bufferhead first, then
iterate all the pages on it. This is messy because the cache indexes
via pages, not bufferheads. hence a buffer needs to point to all the
pages in it explicitly, and this leads to interesting issues with
locking.
If you still think that this is a good idea, I suggest that you
So block size > page cache size... also, you should obviously be using
hardware that is tuned to work well with 4K pages, because surely there
is lots of that around.
The CPU hardware works well with 4k pages, but in general I/O
hardware works more efficiently as the numbers of s/g entries they
require drops for a given I/O size. Given that we limit drivers to
128 s/g entries, we really aren't using I/O hardware to it's full
potential or at it's most efficient by limiting each s/g entry to a
single 4k page.
And FWIW, a having a buffer for block size > page size does not
solve this problem - only contiguous page allocation solves this
problem.