Re: [00/17] Large Blocksize Support V3

From: Christoph Hellwig
Date: Thu Apr 26 2007 - 14:25:34 EST

On Thu, Apr 26, 2007 at 08:12:51PM +0200, Jens Axboe wrote:
> > Exactly. But the only counter-proposal we have so far seems far worse :)
> Lets look at some numbers. I'll just concentrate on the scatterlist,
> since the bio_vec is smaller. On x86 32-bit, the scatterlist is 20 bytes
> long. If we accept that 2^1 allocations are ok (they should be), then we
> can support ~1.6mb ios just like that.
> My approach would be to support scatterlist chaining. Essentially you'd
> have the last element of the sglist pointing to the next array of
> entries. We can then stick to 128 entry arrays which fit nicely in a
> single page allocation and easily support >> 2mb ios. The only caveat is
> that you'd need to update the drivers to get there, since a regular
> iteration over the array isn't enough. My plan was to add an sglist
> iterator helper that hides this from the drivers, if they need to loop
> over the scatterlist. Things like {dma/pci}_map_sg() would of course be
> updated.
> The above can be implemented fairly cleanly, and on a need-to-have
> basis. It's not something that'll break drivers.
> What do you think?

Purely for the I/O sizes to external arrays problem that's nice,
and I think we (well, you :)) should implement it.

But there's other reasons why larger objects in the page cache make
sense that are mostly related to keeping overhead for large files
in the operating system down. So I'd go both for s/g list chaining
and variable order pagecache.

Btw, we should talk a little about the sglist iterators on linux-scsi,
as a lot of the dma mapping API will need updates for bidirection dmas
anyway, and we should try to get everything done in one rush.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at