Re: [RFC] Per-inode metadata cache.

Alexander Viro (viro@math.psu.edu)
Mon, 18 Oct 1999 13:26:45 -0400 (EDT)

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Stephen C. Tweedie: "Re: RAW and soft RAID interaction"
Previous message: nathan.zook@amd.com: "RE: Aargh !! What's with 2.3.22 ?"

On Mon, 18 Oct 1999, Andrea Arcangeli wrote:

> On Sat, 16 Oct 1999, Alexander Viro wrote:
>
> > c) Currently we keep the stuff for the first class around the page
> >cache and the rest in buffer cache. Large part of our problems comes
> >from the fact that we need to detect migration of block from one class to
> >another and scanning the whole buffer cache is way too slow.
>
> We don't scan the whole buffer cache. We only do a fast query on the
> buffer hash.
>
> And I still can't see how you can find the stale buffer in a per-object
> queue as the object can be destroyed as well after the lowlevel truncate.

? The same way you are doing it with pagecache.

> You can't even know which is the inode Y that is using a block X without
> reading all the inode metadata while the block X still belongs to the
> inode Y (before the truncate).

WTF would we _need_ to know? Think about it as about memory caching. You
can cache by virtual address and you can cache by physical address. And
since we have no aliasing here... Andrea, consider the thing as VM. You
have virtual addresses (offsets in file for data, offsets in directory for
directory contents, beginning of covered range for indirect blocks, etc.)
and you have physical address (offset on a disk). You have mapping of VA
to PA (->bmap(), ->get_block()) (context being the file). Old scheme was
equivalent to caching by PA and recalculation of mapping on each use. New
sceme gives caching by context+VA and provides a working TLB. We already
have it for data. For metadata we are still caching by PA. And getting the
nasty pile of it with cache coherency. Would you like to work with MMU of
such architecture?

> Right now you know only that the stale buffer can't came from inode
> metadata as we have the race prone slowww trick to loop in polling mode
> inside ext2_truncate until we'll be sure bforget will run in hard mode.

And? Don't use bread() on metadata. It should never enter the buffer hash,
just as buffer_heads of _data_ never enter it. Leave bread() for
statically allocated stuff.

> And even if you know such information you should do a linear search in the
> list that's slower than an hash lookup anyway (or you can start slowly
> unmapping all the other stale buffers even if you are not interested
> about, so you may block more than necessary). Doing a per-object hash is
> not an option IMHO.

You are kinda late with it - it's already done for data. The question
being: do we ever need lookups by physical address (disk location)?
AFAICS it is _not_ needed.

> > d) Moving the second class into the page cache will cause problems
> >with bigmem stuff. Besides, I have the reasons of my own for keeping those

> I can't see these bigmem issues. The buffer and page-cache memory is not
> in bigmem anyway. And you can use bigmem _wherever_ you want as far as you
> remeber to fix all the involved code to kmap before read/write to
> potential bigmem memory. bigmem issue looks like a red-herring to me.

Right. And you may want different policies for data and metadata - the
latter should always be in kvm. AFAICS that's what SCT refered to.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/

Next message: Stephen C. Tweedie: "Re: RAW and soft RAID interaction"
Previous message: nathan.zook@amd.com: "RE: Aargh !! What's with 2.3.22 ?"