Re: Ok, explained.. (was Re: [PATCH] mm: fix page_mkclean_one)
From: Andrew Morton
Date: Fri Dec 29 2006 - 20:18:22 EST
On Fri, 29 Dec 2006 16:58:41 -0800 (PST)
Linus Torvalds <torvalds@xxxxxxxx> wrote:
>
>
> On Fri, 29 Dec 2006, Andrew Morton wrote:
> >
> > > > Somewhat nastily, but as ext3 directories are metadata it is appropriate
> > > > that modifications to them be done in terms of buffer_heads (ie: blocks).
> > >
> > > No. There is nothing "appropriate" about using buffer_heads for metadata.
> >
> > I said "modification".
>
> You said "metadata".
>
> Why do you think directories are any different from files? Yes, they are
> metadata. So what? What does that have to do with anything?
We journal the contents of directories. Fully. So we handle their dirty
data at the block (ie: buffer_head) level. When someone tries to dirty
part of a directory we need to cheat and not mark that part of the page as
dirty and we need to then write the block to the journal and then mark the
block as really dirty for checkpointing (but still attached to the journal)
and all that goop.
The regular page-based writeback doesn't apply until the block has been
written to the journal. At that stage the block is considered dirty
against its real position on disk. It will then be written back by pdflush
via the blockdev inode -> blkdev_writepage(). Unless kjournald needs to do
an early flush to reclaim the journal space, in which case kjournald will
write the block itself.
>
> So I really don't understand why you make excuses for ext3 and talk about
> "modifications" and "metadata". It was a fine design ten years ago. It's
> not really very good any longer.
>
As I said in another apparently-neglected email:
: We could possibly move ext3/4 directories out of the blockdev pagecache and
: into per-directory pagecache, but that wouldn't change anything - the
: journalling would still be block-based.
We already have all the code in place to journal blocks which are cached in
an address_space other than the blockdev inode's: ext3_journalled_aops.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/