Re: [00/17] Large Blocksize Support V3

From: Nick Piggin
Date: Thu Apr 26 2007 - 10:48:30 EST


Mel Gorman wrote:
On (25/04/07 23:46), Christoph Lameter didst pronounce:

On Thu, 26 Apr 2007, Nick Piggin wrote:


Yeah. IMO anti-fragmentation and defragmentation is the hack, and we
should stay away from higher order allocations whenever possible.

Right and we need to create series of other approaches that we then label "non-hack" to replace it.



To date, there hasn't been a credible alternative to dealing with
fragmentation. Breaking the 1:1 virtual:physical mapping and defragmenting
would incur a serious performance hit.

Depends what you mean by "dealing with", I guess.

I would say there has been no credible alternative to virtually mapping
the kernel.


Hardware is built to handle many small pages efficintly, and I don't
understand how it could be an SGI-only issue. Sure, you may have an
order of magnitude or more memory than anyone else, but even my lowly
desktop _already_ has orders of magnitude more pages than it has TLB
entries or cache -- if a workload is cache-nice for me, it probably
will be on a 1TB machine as well, and if it is bad for the 1TB machine,
it is also bad on mine.

There have been numbers of people that have argued the same point. Just because we have developed a way of thinking to defend our traditional 4k values does not make them right.


If this is instead an issue of io path or reclaim efficiency, then it
would be really nice to see numbers... but I don't think making these
fundamental paths more complex and slower is a nice way to fix it
(larger PAGE_SIZE would be, though).

The code paths can stay the same. You can switch CONFIG_LARGE pages off
if you do not want it and it is as it was.



It may not even need that that much effort. The most stressful use of the
high order allocation paths here require the creation of a filesystem and
is a deliberate action by the user.

Saying "oh this stuff may not always work quite right for everyone, but
it is OK because it is a special purpose solution for now" IMO is a big
sign saying that it is a bad design, and including it means we're lumped
with it forever.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/