Re: [patch 00/14] Page cache cleanup in anticipation of LargeBlocksize support

From: Andrew Morton
Date: Thu Jun 14 2007 - 21:41:07 EST


On Thu, 14 Jun 2007 17:45:43 -0700 (PDT) Christoph Lameter <clameter@xxxxxxx> wrote:

> On Thu, 14 Jun 2007, Andrew Morton wrote:
>
> > > I do not think that the 100% users will do kernel compiles all day like
> > > we do. We likely would prefer 4k page size for our small text files.
> >
> > There are many, many applications which use small files.
>
> There is no problem with them using 4k page size concurrently to a higher
> page size for other files.

There will be files which should use 64k but which instead end up using 4k.

There will be files which should use 4k but which instead end up using 64k.

Because determining which size to use requires either operator intervention
or kernel heuristics, both of which will be highly unreliable.

It's better to just make 4k pages go faster.

> > > I never understood the point of that exercise. If you have variable page
> > > size then the 64k page size can be used specific to files that benefit
> > > from it. Typically usage scenarios are video audio streaming I/O, large
> > > picture files, large documents with embedded images. These are the major
> > > usage scenarioes today and we suck the. Our DVD/CD subsystems are
> > > currently not capable of directly reading from these devices into the page
> > > cache since they do not do I/O in 4k chunks.
> >
> > So with sufficient magical kernel heuristics or operator intervention, some
> > people will gain some benefit from 64k pagesize. Most people with most
> > workloads will remain where they are: shoving zillions of physically
> > discontiguous pages into fixed-size sg lists.
>
> Magical? There is nothing magical about doing transfers in the size that
> is supported by a device. That is good sense.

By magical heuristics I'm referring to the (required) tricks and guesses
which the kernel will need to deploy to be able to guess which page-size it
should use for each file.

Because without such heuristics, none of this new stuff which you're
proposing would ever get used by 90% of apps on 90% of machines.

> > > Every 64k block contains more information and the number of pages managed
> > > is reduced by a factor of 16. Less seeks , less tlb pressure , less reads,
> > > more cpu cache and cpu cache prefetch friendly behavior.
> >
> > argh. Everything you say is just wrong. A fsck involves zillions of
> > discontiguous small reads. It is largely seek-bound, so there is no
> > benefit to be had here. Your proposed change will introduce regressions by
> > causing larger amounts of physical reading and large amounts of memory
> > consumption.
>
> Of course there is. The seeks are reduced since there are an factor
> of 16 less metadata blocks. fsck does not read files. It just reads
> metadata structures. And the larger contiguous areas the faster.

Some metadata is contiguous: inode tables, some directories (if they got
lucky), bitmap tables. But fsck surely reads them in a single swoop
anyway, so there's no gain there.

Other metadata (indirect blocks) are 100% discontiguous, and reading those
with a 64k IO into 64k of memory is completely dumb.

And yes, I'm referring to the 90% case again. The one we want to
optimise for.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/