Re: linux-2.4.10-pre5

From: Daniel Phillips (phillips@bonn-fries.net)
Date: Sun Sep 09 2001 - 18:56:13 EST


On September 9, 2001 06:28 am, Linus Torvalds wrote:
> [ start future rambling ]
>
> Now, I actually believe that we would eventually be better off by having a
> real filesystem interface for doing kernel-assisted fsck, backup and
> de-fragmentation (all just three different sides of the same thing:
> filesystem administration), but that's a separate issue, and right now we
> have absolutely no clue about what such an interface would look like. So
> that's a long-term thing (other OS's have such interfaces, but they tend
> to be specialized for only a subset of possible filesystems. Think
> Windows defrag).
>
> That kind of explicit filesystem maintenance interface would make all the
> remaining issues go away completely, and would at least in theory allow
> on-line fixing of already mounted filesystems. Assuming a good interface
> and algorithm for it exists, of course.

A more modest goal, which I'm simply restating from earlier in the thread,
would be to achieve a coherent view of a volume at the block level. This is
just a cache coherency problem. The trouble is, every solution that's been
looked at so far has a non-zero cost associated with it for the normal case
of no sharing.

*However*. If we could find a method of implementing the cache coherency so
it actually accelerates the common case then it would become an attractive
thing to do.

Physically-based readahead might be such a magic bullet. The structural cost
of the cache coherency would be a reverse-lookup hash chain on each page
cache page, which we already almost have with the unused buffer hash links.
The operational cost would be an extra hash probe for each block read, write,
create, or delete. Against this, physical readahead can sometimes do a
better job than logical readahead because it can optimize across files and
metadata structures, something that is difficult to do at the logical level.
Spending a few extra CPU cycles to buy improved filesystem latency in the
common case would be a good trade.

While I'm in here between the [future] brackets, I'll mention a few points
in favor:

  - Some of the reverse probes can be optimized away, for example, when we
    find a page in the pagecache we can skip the reverse probe because we
    know it's not in the physical (aka buffer) cache. This should be the
    common case.

  - Scsi disks, and now some IDE disks, already do physical readahead, but
    they typically have far less cache memory than the processor does. If
    physical readahead is good, then more must be better ;-) In any
    event, moving the physical readahead onto the processor side of the
    bus reduces latency.

  - Physical readahead and logical readahead could conceivably work
    together, for example, by using an interface where the vfs provides
    hints to a physical readahead process, rather than taking on the job
    of intiating IO itself.

  - There are opportunities to do zero-copy moves from the physical
    cache to the page cache, by relinking the underlying page instead
    of copying.

  - This isn't a huge change to the way we already do things. The
    interface to the physical cache is simply the existing buffer
    API. We repurpose the currently unused hash chain for page cache
    buffers, and that's transparent. The vm scanner has to know a
    few more rules about freeing buffers. The actual cache
    consistency is implemented transparently by the block IO library.

  - If we ever do get to the point of having variable sized page
    objects the implementation gets simpler. Not that it's
    particularly complex as it is.

> [ end future rambling ]

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 15 2001 - 21:00:21 EST