Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)

From: Linus Torvalds
Date: Fri Dec 29 2006 - 19:12:48 EST

On Fri, 29 Dec 2006, Andrew Morton wrote:
> They're extra. As in "can be optimised away".

Sure. Don't use buffer heads.

> The buffer_head is not an IO container. It is the kernel's core
> representation of a disk block.

Please come back from the 90's.

The buffer heads are nothing but a mapping of where the hardware block is.
If you use it for anything else, you're basically screwed.

> JBD implements physical block-based journalling, so it is 100% appropriate
> that JBD deal with these disk blocks using their buffer_head
> representation.

And as long as it does that, you just have to face the fact that it's
going to perform like crap, including what you call "extra" writes, and
what I call "deal with it".

Btw, you can make pages be physically indexed too, but they obviously
(a) won't be coherent with any virtual mapping laid on top of it
(b) will be _physical_, so any readahead etc will be based on physical
addresses too.

> I thought I fixed the performance problem?

No, you papered over it, for the reasonably common case where things were
physically contiguous - exactly by using a physical page cache, so now it
can do read-ahead based on that. Then, because the pages contain buffer
heads, the directory accesses can look up buffers, and if it was all
physically contiguous, it all works fine.

But if you actually want virtualluy indexed caching (and all _users_ want
it), it really doesn't work.

> Somewhat nastily, but as ext3 directories are metadata it is appropriate
> that modifications to them be done in terms of buffer_heads (ie: blocks).

No. There is nothing "appropriate" about using buffer_heads for metadata.

It's quite proper - and a hell of a lot more efficient - to use virtual
page-caching for metadata too.

Look at the ext2 readdir() implementation, and compare it to the crapola
horror that is ext3. Guess what? ext2 uses virtually indexed metadata, and
as a result it is both simpler, smaller and a LOT faster than ext3 in
accessing that metadata.

Face it, Andrew, you're wrong on this one. Really. Just take a look at

[ I'm not saying that ext2_readdir() is _beautiful_. If it had been
written with the page cache in mind, it would probably have been done
very differently. And it doesn't do any readahead, probably because
nobody cared enough, but it should be trivial to add, and it would
automatically "do the right thing" just because it's much easier at the
page cache level.

But I _am_ saying that compared to ext3, the ext2 readdir is a work of
art. ]

"metadata" has _zero_ to do with "physically indexed". There is no
correlation what-so-ever. If you think there is a correlation, it's all in
your mind.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at