Nick Piggin wrote:
I don't understand what you mean at all. A block has always been a
contiguous area of disk.
Lets take Nick's definition of block being a disk based unit for the
moment. That does not change the key contention here, that even with
hardware specifically designed to handle 4k pages that hardware handles
larger contigious areas more efficiently. David Chinner gives us
figures showing major overall throughput improvements from (I assume)
shorter scatter gather lists and better tag utilisation. I am loath to
say we can just blame the hardware vendors for poor design.
Actually, I don't know why people are so excited about being able to
use higher order allocations (I would rather be more excited about
never having to use them). But for those few places that really need
it, I'd rather see them use a virtually mapped kernel with proper
defragmentation rather than putting hacks all through the core code.
Virtually mapping the kernel was considered pretty seriously around the
time SPARSEMEM was being developed. However, that leads to a
non-constant relation for converting kernel virtual addresses to
physical ones which leads to significant complexity, not to mention
runtime overhead.
As a solution to the problem of supplying large pages from the allocator
it seems somewhat unsatisfactory. If no significant other changes are
made in support of large allocations, the process of defragmenting
becomes very expensive. Requiring a stop_machine style hiatus while the
physical copy and replace occurs for any kernel backed memory.
To put it a different way, even with such a full defragmentation scheme
available some sort of avoidance scheme would be highly desirable to
avoid using the very expensive deframentation underlying it.
Is that a big problem? Really? You use 16K pages on your IPF systems,
don't you?
To my knowledge, moving to a higher base page size has its advantages in
TLB reach, but brings with it some pretty serious downsides. Especially
in caching small files. Internal fragmentation in the page cache
significantly affecting system performance. So much so that development
is ongoing to see if supporting sub-base-page objects in the buffer
cache could be beneficial.