Re: invalidate_inode_pages in 2.5.32/3

From: Andrew Morton (akpm@digeo.com)
Date: Mon Sep 09 2002 - 17:03:32 EST

Next message: Alan Cox: "Re: LMbench2.0 results"
Previous message: Ken Moffat: "2.4.20-pre5-ac4 hda lost interrupt"
In reply to: Daniel Phillips: "Re: invalidate_inode_pages in 2.5.32/3"
Next in thread: Daniel Phillips: "Re: invalidate_inode_pages in 2.5.32/3"
Reply: Daniel Phillips: "Re: invalidate_inode_pages in 2.5.32/3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Daniel Phillips wrote:
>
> > I'm very unkeen about using the inaccurate invalidate_inode_pages
> > for anything which matters, really. And the consistency of pagecache
> > data matters.
> >
> > NFS should be using something stronger. And that's basically
> > vmtruncate() without the i_size manipulation.
>
> Yes, that looks good. Semantics are basically "and don't come back
> until every damm page is gone" which is enforced by the requirement
> that we hold the mapping->page_lock though one entire scan of the
> truncated region. (Yes, I remember sweating this one out a year
> or two ago so it doesn't eat 100% CPU on regular occasions.)
>
> So, specifically, we want:
>
> void invalidate_inode_pages(struct inode *inode)
> {
> truncate_inode_pages(mapping, 0);
> }
>
> Is it any harder than that?

Pretty much - need to leave i_size where it was. But there are
apparently reasons why NFS cannot sleepingly lock pages in this particular
context.

> By the way, now that we're all happy with the radix tree, we might
> as well just go traverse that instead of all the mapping->*_pages.
> (Not that I'm seriously suggesting rocking the boat that way just
> now, but it might yield some interesting de-crufting possibilities.)

Oh absolutely.

unsigned long radix_tree_gang_lookup(void **pointers,
unsiged long starting_from_here, unsigned long this_many);

could be used nicely in readahead, drop_behind, truncate, invalidate
and invalidate2. But to use it in writeback (desirable) we would need
additional metadata in radix_tree_node. One bit per page, which means
"this page is dirty" or "this subtree has dirty pages".

I keep saying this in the hope that someone will take pity and write it.

> ...
> Now, what is this invalidate_inode_pages2 seepage about? Called from
> one place. Sheesh.

heh. We still do have some O_DIRECT/pagecache coherency problems.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Alan Cox: "Re: LMbench2.0 results"
Previous message: Ken Moffat: "2.4.20-pre5-ac4 hda lost interrupt"
In reply to: Daniel Phillips: "Re: invalidate_inode_pages in 2.5.32/3"
Next in thread: Daniel Phillips: "Re: invalidate_inode_pages in 2.5.32/3"
Reply: Daniel Phillips: "Re: invalidate_inode_pages in 2.5.32/3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sun Sep 15 2002 - 22:00:18 EST