mel@xxxxxxxxx (Mel Gorman) writes:
On (26/04/07 18:55), Nick Piggin didst pronounce:Mel Gorman wrote:On (26/04/07 16:50), Nick Piggin didst pronounce:
Fragmentation is the problem. The anti-frag patches don't actually
guarantee anything about fragmentation, and even if they did, then
The grouping pages by mobility do not guarantee anything but the memory
partition (kernelcore= boot parameter) does give hard guarantees about
the amount of memory that is "movable". Of course, the partition requires
configuration at boot-time so it's less than ideal but it does give hard
guarantees.
For the hugepages people, I can understand that's a solution.
I think that's the closest you have ever come to saying fragmentation
avoidance is not a terrible idea :)
But that's
the last thing you want to do on a system with a limited amount of memory,
or a regular Joe's desktop/server.
Regular Joe is not going to be creating a filesystem with large blocks or
overly concerned with saturating all the disks hanging off his RAID array.
At most, he'll care about faster DVD writing and considering the number and
duration of those allocations, the system will be able to handle it. So I
don't think Regular Joe will generally care.
The practical question is if this a special purpose hack, or is
this general infrastructure that we expect filesystems to seriously
use.
If this is a special purpose hack then the larger pages and
fragmentation are minor issues, but it's very existence is a problem.
If this is not a special purpose hack and people do use this a lot
then the fragmentation is a problem.
Your reply indicates that fragmentation is a concern, as does the
initial posting and the more positive threads. Therefore either
this is a special purpose hack in which case it's existence is
questionable or it fails a general solution and it's existence
is questionable.
Indeed but then you have to deal with internal fragmentation
for pages-larger-than-TLB-page. I'm not saying it's wrong but it does
come with it's own set of issues.
None of them is perfect (the ways to increase the size of pagecache pages,
that is).
I think in the long term, TLB page sizes will probably increase a little
bit... but if a given page size is "good enough" for a CPU, they really
should be good enough for other hardware. I mean, come on, the CPU's TLB
has to have a good hit ratio and handle several lookups per cycle with a
3-cycle latency on 3GHz+ hardware... surely a an IO controller's
scatter-gather engine or IOMMU that has to do a few lookups per disk IO
is nowhere near so critical as a CPU's datapath: just add a few more
entries to it, they've already got hundreds of megs of cache, so that
isn't an issue either.
I cannot speak with authority on what IO controllers are really capable
of so maybe someone else will comment on this more.
Well here is some reality. Using an FPGA (slow by definition)
connected to a hypertransport interface I have exceeded a gigabyte a
second, without even setting up scatter gather.
All of the I/O on all of the peripheral busses happens in sub
1 page chunks.
At the rates we are talking for disks the issue is really how to
bury the disk latency. For fast arrays how do you submit enough
request to bury the latency.
I can't possibly believe any of this is about the cost of processing
a request, but rather the problem that some devices don't have a large
enough pool of requests to keep them busy if you submit the requests
in a 4K page sizes.
This all sounds like a device design issue rather than anything more
significant, and it doesn't sound like a long term trend. Market
pressure should fix the hardware.
Eric