Re: [00/17] Large Blocksize Support V3

From: Eric W. Biederman
Date: Fri Apr 27 2007 - 09:57:56 EST


Nick Piggin <nickpiggin@xxxxxxxxxxxx> writes:

> Eric W. Biederman wrote:
>> Jens Axboe <jens.axboe@xxxxxxxxxx> writes:
>
>>>Yes, that is exactly the problem. Once you have that, pktcdvd is pretty
>>>much reduced to setup and init code, the actual data handling can be
>>>done by sr or ide-cd directly. You could merge it into cdrom.c, it would
>>>not be very different from mt-rainier handling (which basically does RMW
>>>in firmware, so it works for any write, but performance is of course
>>>horrible if you don't do it right).
>>
>>
>> Thanks for the clarification.
>>
>> So we do have a clear problem that we do not have generic support for
>> large sector sizes residing in the page cache.
>
> Well, it is a clear limitation. It hasn't mattered too much until
> now, but it is one of the other issues that SGI hit (aside from
> io efficiency) because they have 16K filesystems created on ia64
> systems that I believe they want to access with x86-64 systems.

I think the current pktcdvd story is a better argument. There is real
hardware with a > 4K sector size. Of course once we support that
class of hardware support filesystems with a large block size will
also be straight forward.

> I'm slowly looking at patches in the background, but I'm hoping to
> be able to spend a decent chunk of time working on them again soon.
>
> It isn't trivial :)

I guess it depends on how you look at it.

If we can drop the assumption that large sector sizes are virtually
contiguous I expect things will be closer to trivial.

If we can do a page group thing where we keep the all of the I/O state on
the first cache page I expect things won't be to bad.

I do seem to see some VM affects needed from allocating and freeing
several pages together.

I also see an opportunity in allocating several pages at once. We
could make it one call that returns a vector of pages and the page
allocator could satisfy our request with a high order page split into
individual pages if it was available. The the I/O layer would have
to notice that we are giving it several page structs that are
physically contiguous.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/