Re: The performance and behaviour of the anti-fragmentation related patches

From: Nick Piggin
Date: Fri Mar 02 2007 - 01:30:44 EST


On Thu, Mar 01, 2007 at 10:19:48PM -0800, Christoph Lameter wrote:
> On Fri, 2 Mar 2007, Nick Piggin wrote:
>
> > > >From the I/O controller and from the application.
> >
> > Why doesn't the application need to deal with TLB entries?
>
> Because it may only operate on a small section of the file and hopefully
> splice the rest through? But yes support for mmapped I/O would be
> necessary.

So you're talking about copying a file from one location to another?


> > > This would only be a temporary fix pushing the limits to the double or so?
> >
> > And using slightly larger page sizes isn't?
>
> There was no talk about slightly. 1G page size would actually be quite
> convenient for some applications.

But it is far from convenient for the kernel. So we have hugepages, so
we can stay out of the hair of those applications and they can stay out
of hours.

> > > Amortized? The controller still would have to hunt down the 4kb page
> > > pieces that we have to feed him right now. Result: Huge scatter gather
> > > lists that may themselves create issues with higher page order.
> >
> > What sort of numbers do you have for these controllers that aren't
> > very good at doing sg?
>
> Writing a terabyte of memory to disk with handling 256 billion page
> structs? In case of a system with 1 petabyte of memory this may be rather
> typical and necessary for the application to be able to save its state
> on disk.

But you will have newer IO controllers, faster CPUs...

Is it a problem or isn't it? Waving around the 256 billion number isn't
impressive because it doesn't really say anything.

> > Isn't the issue was something like your IO controllers have only a
> > limited number of sg entries, which is fine with 16K pages, but with
> > 4K pages that doesn't give enough data to cover your RAID stripe?
> >
> > We're never going to do a variable sized pagecache just because of that.
>
> No, we need support for larger page sizes than 16k. 16k has not been fine
> for a couple of years. We only agreed to 16k because that was the common
> consensus. Best performance was always at 64k 4 years ago (but then we
> have no numbers for higher page sizes yet). Now we would prefer much
> larger sizes.

But you are in a tiny minority, so it is not so much a question of what
you prefer, but what you can make do with without being too intrusive.

I understand you have controllers (or maybe it is a block layer limit)
that doesn't work well with 4K pages, but works OK with 16K pages.
This is not something that we would introduce variable sized pagecache
for, surely.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/