Re: invalidate_inode_pages in 2.5.32/3

From: Daniel Phillips (phillips@arcor.de)
Date: Mon Sep 09 2002 - 16:08:46 EST


On Saturday 07 September 2002 18:06, Andrew Morton wrote:
> Daniel Phillips wrote:
> > > If the VM wants to reclaim a page, and it has PG_private set then
> > > the vm will run mapping->releasepage() against the page. The mapping's
> > > releasepage must try to clear away whatever is held at ->private. If
> > > that was successful then releasepage() must clear PG_private, decrement
> > > page->count and return non-zero. If the info at ->private is not
> > > freeable, releasepage returns zero. ->releasepage() may not sleep in
> > > 2.5.
> > >
> > > So. NFS can put anything it likes at page->private. If you're not
> > > doing that then you don't need a releasepage. If you are doing that
> > > then you must have a releasepage().
> >
> > Right now, there are no filesystems actually doing anything filesystem
> > specific here, are there? I really wonder if making this field, formerly
> > known as buffers, opaque to the vfs is the right idea.
>
> That's right - it is only used for buffers at present. I was using
> page->private in the delayed-allocate code for directly holding the
> disk mapping information.

It's worth taking a deep breath at this point and considering whether that
part of delalloc can be written generically, supposing that page->buffers
were restored to its former non-opaque status.

> There was some talk of using it for <mumble> in XFS.

Christoph would be familiar with that issue, if there is one.

> Also it may be used in the NFS server for storing credential
> information.

The NFS server is still a deep, black hole in the kernel from my point of
view and I'd like that situation to end as soon as possible, so it might
as well start ending now. Can you provide me a pointer to go start
digging at that specific question?

(And strongly agreed about the invalidate_inode_pages(2) issue: at some
point it would behoove VM and NFS developers to reach a mutual
understanding of what that interface is supposed to do, because it is
growing new warts and tentacles at an alarming rate, and still seems to
be, at best, a heuristic. I currently have the impression that the
intent is to make files sort-of coherent between clients, for some
slippery definition of sort-of.)

> Also it could be used for MAP_SHARED pages for credential
> information - to fix the problem wherein kswapd (ie: root) is the
> one who instantiates the page's blocks, thus allowing non-root programs
> to consume the root-only reserved ext2/ext3 blocks.

OK, well I have four action items:

  - Listen to Christoph wax poetic about how XFS handles buffers and
    whether he needs them to be opaque to the vfs.

  - Think about delalloc with the goal of providing the mechanism at
    the vfs level, subject to somebody proving it actually has some
    benefit (maybe XFS already proves this? Come to think of it, if
    XFS already supports delayed allocation and doesn't rely on
    page->private...)

  - Find out what NFS really thinks invalidate_inode_pages(2) is
    supposed to do.

  - Start to sniff at the credential issue, with a view to
    ascertaining whether that subsystem ultimately relies on the
    buffer field being opaque. Intuitively I think 'no', the
    credential mechanism is supposed to be a generic vfs mechanism,
    and so for NFS or anyone else to bury credential state in the
    field formerly known as buffers is fundamentally wrong.

In general, I think we'd be better off if page->buffers was not opaque,
and that it should remain non-opaque until we are definitely ready to
get rid of them. Doing otherwise will just allow various applications
to start growing tendrils into the field, making it that much harder
to get rid of when the time comes.

So the question is, does anyone *really* need (void *) page->private
instead of page->buffers?

-- 
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Sep 15 2002 - 22:00:18 EST