Re: [00/17] Large Blocksize Support V3

From: Nick Piggin
Date: Thu Apr 26 2007 - 01:38:26 EST


Eric W. Biederman wrote:
clameter@xxxxxxx writes:


V2->V3
- More restructuring
- It actually works!
- Add XFS support
- Fix up UP support
- Work out the direct I/O issues
- Add CONFIG_LARGE_BLOCKSIZE. Off by default which makes the inlines revert
back to constants. Disabled for 32bit and HIGHMEM configurations.
This also allows a gradual migration to the new page cache
inline functions. LARGE_BLOCKSIZE capabilities can be
added gradually and if there is a problem then we can disable
a subsystem.

V1->V2
- Some ext2 support
- Some block layer, fs layer support etc.
- Better page cache macros
- Use macros to clean up code.

This patchset modifies the Linux kernel so that larger block sizes than
page size can be supported. Larger block sizes are handled by using
compound pages of an arbitrary order for the page cache instead of
single pages with order 0.


Huh?

You seem to be mixing two very different concepts.

The page cache has no problems supporting things with a block
size larger then page size. Now the block device layer may not
have the code to do the scatter gather into small pages and it
may not handle buffer heads whose data is split between multiple
pages.

Yeah, this patch is not really large blocksize support (which we normally
think of as block size > page cache size).


But this is not a page cache issue.

And generally larger physical pages are a mistake to use.
Especially as it looks from some of the later comment you don't
date test on 32bit because the memory fragments faster.

I actually completely agree with this, and I'm concerned in general about
using higher order pages. I think it is fundamentally the wrong approach
because of fragmentation and defragmentation costs (similarly to Linus's
take on page colouring).

I think starting with the assumption that we _want_ to use higher order
allocations, and then creating all this complexity around that is not a
good one, and if we start introducing things that _require_ significant
higher order allocations to function then it is a nasty thing for
robustness.


Is it common for hardware that supports large block sizes to not
support splitting those blocks apart during DMA? Unless it is common
the whole premise of this patchset seems broken.

I suspect what needs to be fixed is the page cache block device
interface so that we have helper functions that know how to stuff
a single block into several pages.

I am working now and again on some code to do this, it is a big job but
I think it is the right way to do it. But it would take a long time to
get stable and supported by filesystems...


That would make the choice of using larger order pages (essentially
increasing PAGE_SIZE) something that can be investigated in parallel.

I agree that hardware inefficiencies should be handled by increasing
PAGE_SIZE (not making PAGE_CACHE_SIZE > PAGE_SIZE) at the arch level.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/