[PATCH 3/4] Rework the CacheFS documentation to reflect FS-Cache split

From: David Howells
Date: Wed Oct 06 2004 - 11:47:01 EST



The attached patch reworks the CacheFS documentation to reflect the new split
between CacheFS and FS-Cache.

Signed-Off-By: David Howells <dhowells@xxxxxxxxxx>
---

warthog1>diffstat fscache-docs-269rc3mm2.diff
cachefs.txt | 881 ------------------------------------------------
caching/backend-api.txt | 317 +++++++++++++++++
caching/cachefs.txt | 274 ++++++++++++++
caching/fscache.txt | 94 +++++
caching/netfs-api.txt | 583 +++++++++++++++++++++++++++++++
5 files changed, 1268 insertions(+), 881 deletions(-)

diff -uNrp linux-2.6.9-rc3-mm2/Documentation/filesystems/cachefs.txt linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/cachefs.txt
--- linux-2.6.9-rc3-mm2/Documentation/filesystems/cachefs.txt 2004-10-05 10:38:12.000000000 +0100
+++ linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/cachefs.txt 1970-01-01 01:00:00.000000000 +0100
@@ -1,892 +0,0 @@
- ===========================
- CacheFS: Caching Filesystem
- ===========================
-
-========
-OVERVIEW
-========
-
-CacheFS is a general purpose cache for network filesystems, though it could be
-used for caching other things such as ISO9660 filesystems too.
-
-CacheFS uses a block device directly rather than a bunch of files under an
-already mounted filesystem. For why this is so, see further on. If necessary,
-however, a file can be loopback mounted as a cache.
-
-CacheFS does not follow the idea of completely loading every netfs file opened
-into the cache before it can be operated upon, and then serving the pages out
-of CacheFS rather than the netfs because:
-
- (1) It must be practical to operate without a cache.
-
- (2) The size of any accessible file must not be limited to the size of the
- cache.
-
- (3) The combined size of all opened files (this includes mapped libraries)
- must not be limited to the size of the cache.
-
- (4) The user should not be forced to download an entire file just to do a
- one-off access of a small portion of it.
-
-It rather serves the cache out in PAGE_SIZE chunks as and when requested by
-the netfs('s) using it.
-
-
-CacheFS provides the following facilities:
-
- (1) More than one block device can be mounted as a cache.
-
- (2) Caches can be mounted / unmounted at any time.
-
- (3) The netfs is provided with an interface that allows either party to
- withdraw caching facilities from a file (required for (2)).
-
- (4) The interface to the netfs returns as few errors as possible, preferring
- rather to let the netfs remain oblivious.
-
- (5) Cookies are used to represent files and indexes to the netfs. The simplest
- cookie is just a NULL pointer - indicating nothing cached there.
-
- (6) The netfs is allowed to propose - dynamically - any index hierarchy it
- desires, though it must be aware that the index search function is
- recursive and stack space is limited.
-
- (7) Data I/O is done direct to and from the netfs's pages. The netfs indicates
- that page A is at index B of the data-file represented by cookie C, and
- that it should be read or written. CacheFS may or may not start I/O on
- that page, but if it does, a netfs callback will be invoked to indicate
- completion.
-
- (8) Cookies can be "retired" upon release. At this point CacheFS will mark
- them as obsolete and the index hierarchy rooted at that point will get
- recycled.
-
- (9) The netfs provides a "match" function for index searches. In addition to
- saying whether a match was made or not, this can also specify that an
- entry should be updated or deleted.
-
-(10) All metadata modifications (this includes index contents) are performed
- as journalled transactions. These are replayed on mounting.
-
-
-=============================================
-WHY A BLOCK DEVICE? WHY NOT A BUNCH OF FILES?
-=============================================
-
-CacheFS is backed by a block device rather than being backed by a bunch of
-files on a filesystem. This confers several advantages:
-
- (1) Performance.
-
- Going directly to a block device means that we can DMA directly to/from
- the the netfs's pages. If another filesystem was managing the backing
- store, everything would have to be copied between pages. Whilst DirectIO
- does exist, it doesn't appear easy to make use of in this situation.
-
- New address space or file operations could be added to make it possible to
- persuade a backing discfs to generate block I/O directly to/from disc
- blocks under its control, but that then means the discfs has to keep track
- of I/O requests to pages not under its control.
-
- Furthermore, we only have to do one lot of readahead calculations, not
- two; in the discfs backing case, the netfs would do one and the discfs
- would do one.
-
- (2) Memory.
-
- Using a block device means that we have a lower memory usage - all data
- pages belong to the netfs we're backing. If we used a filesystem, we would
- have twice as many pages at certain points - one from the netfs and one
- from the backing discfs. In the backing discfs model, under situations of
- memory pressure, we'd have to allocate or keep around a discfs page to be
- able to write out a netfs page; or else we'd need to be able to punch a
- hole in the backing file.
-
- Furthermore, whilst we have to keep a CacheFS inode around in memory for
- every netfs inode we're backing, a backing discfs would have to keep the
- dentry and possibly a file struct too.
-
- (3) Holes.
-
- The cache uses holes to indicate to the netfs that it hasn't yet
- downloaded the data for that page.
-
- Since CacheFS is its own filesystem, it can support holes in files
- trivially. Running on top of another discfs would limit us to using ones
- that can support holes.
-
- Furthermore, it would have to be made possible to detect holes in a discfs
- file, rather than just seeing zero filled blocks.
-
- (4) Data Consistency.
-
- Cachefs uses a pair of journals to keep track of the state of the cache
- and all the pages contained therein. This means that it doesn't get into
- an inconsistent state in the on-disc cache and it doesn't lose disc space.
-
- CacheFS takes especial care between the allocation of a block and its
- splicing into the on-disc pointer tree, and the data having been written
- to disc. If power is interrupted and then restored, the journals are
- replayed and if it is seen that a block was allocated but not written it
- is then punched out. Being backed by a discfs, I'm not certain what will
- happen. It may well be possible to mark a discfs's journal, if it has one,
- but how does the discfs deal with those marks? This also limits consistent
- caching to running on journalled discfs's where there's a function to
- write extraordinary marks into the journal.
-
- The alternative would be to keep flags in the superblock, and to
- re-initialise the cache if it wasn't cleanly unmounted.
-
- Knowing that your cache is in a good state is vitally important if you,
- say, put /usr on AFS. Some organisations put everything barring /etc,
- /sbin, /lib and /var on AFS and have an enormous cache on every
- computer. Imagine if the power goes out and renders every cache
- inconsistent, requiring all the computers to re-initialise their caches
- when the power comes back on...
-
- (5) Recycling.
-
- Recycling is simple on CacheFS. It can just scan the metadata index to
- look for inodes that require reclamation/recycling; and it can also build
- up a list of the least recently used inodes so that they can be reclaimed
- later to make space.
-
- Doing this on a discfs would require a search going down through a nest
- of directories, and would probably have to be done in userspace.
-
- (6) Disc Space.
-
- Whilst the block device does set a hard ceiling on the amount of space
- available, CacheFS can guarantee that all that space will be available to
- the cache. On a discfs-backed cache, the administrator would probably want
- to set a cache size limit, but the system wouldn't be able guarantee that
- all that space would be available to the cache - not unless that cache was
- on a partition of its own.
-
- Furthermore, with a discfs-backed cache, if the recycler starts to reclaim
- cache files to make space, the freed blocks may just be eaten directly by
- userspace programs, potentially resulting in the entire cache being
- consumed. Alternatively, netfs operations may end up being held up because
- the cache can't get blocks on which to store the data.
-
- (7) Users.
-
- Users can't so easily go into CacheFS and run amok. The worst they can do
- is cause bits of the cache to be recycled early. With a discfs-backed
- cache, they can do all sorts of bad things to the files belonging to the
- cache, and they can do this quite by accident.
-
-
-On the other hand, there would be some advantages to using a file-based cache
-rather than a blockdev-based cache:
-
- (1) Having to copy to a discfs's page would mean that a netfs could just make
- the copy and then assume its own page is ready to go.
-
- (2) Backing onto a discfs wouldn't require a committed block device. You would
- just nominate a directory and go from there. With CacheFS you have to
- repartition or install an extra drive to make use of it in an existing
- system (though the loopback device offers a way out).
-
- (3) CacheFS requires the netfs to store a key in any pertinent index entry,
- and it also permits a limited amount arbitrary data to be stored there.
-
- A discfs could be requested to store the netfs's data in xattrs, and the
- filename could be used to store the key, though the key would have to be
- rendered as text not binary. Likewise indexes could be rendered as
- directories with xattrs.
-
- (4) You could easily make your cache bigger if the discfs has plenty of space,
- you could even go across multiple mountpoints.
-
-
-======================
-GENERAL ON-DISC LAYOUT
-======================
-
-The filesystem is divided into a number of parts:
-
- 0 +---------------------------+
- | Superblock |
- 1 +---------------------------+
- | Update Journal |
- +---------------------------+
- | Validity Journal |
- +---------------------------+
- | Write-Back Journal |
- +---------------------------+
- | |
- | Data |
- | |
- END +---------------------------+
-
-The superblock contains the filesystem ID tags and pointers to all the other
-regions.
-
-The update journal consists of a set of entries of sector size that keep track
-of what changes have been made to the on-disc filesystem, but not yet
-committed.
-
-The validity journal contains records of data blocks that have been allocated
-but not yet written. Upon journal replay, all these blocks will be detached
-from their pointers and recycled.
-
-The writeback journal keeps track of changes that have been made locally to
-data blocks, but that have not yet been committed back to the server. This is
-not yet implemented.
-
-The journals are replayed upon mounting to make sure that the cache is in a
-reasonable state.
-
-The data region holds a number of things:
-
- (1) Index Files
-
- These are files of entries used by CacheFS internally and by filesystems
- that wish to cache data here (such as AFS) to keep track of what's in
- the cache at any given time.
-
- The first index file (inode 1) is special. It holds the CacheFS-specific
- metadata for every file in the cache (including direct, single-indirect
- and double-indirect block pointers).
-
- The second index file (inode 2) is also special. It has an entry for
- each filesystem that's currently holding data in this cache.
-
- Every allocated entry in an index has an inode bound to it. This inode is
- either another index file or it is a data file.
-
- (2) Cached Data Files
-
- These are caches of files from remote servers. Holes in these files
- represent blocks not yet obtained from the server.
-
- (3) Indirection Blocks
-
- Should a file have more blocks than can be pointed to by the few
- pointers in its storage management record, then indirection blocks will
- be used to point to further data or indirection blocks.
-
- Three levels of indirection are currently supported:
-
- - single indirection
- - double indirection
-
- (4) Allocation Nodes and Free Blocks
-
- The free blocks of the filesystem are kept in two single-branched
- "trees". One tree is the blocks that are ready to be allocated, and the
- other is the blocks that have just been recycled. When the former tree
- becomes empty, the latter tree is decanted across.
-
- Each tree is arranged as a chain of "nodes", each node points to the next
- node in the chain (unless it's at the end) and also up to 1022 free
- blocks.
-
-Note that all blocks are PAGE_SIZE in size. The blocks are numbered starting
-with the superblock at 0. Using 32-bit block pointers, a maximum number of
-0xffffffff blocks can be accessed, meaning that the maximum cache size is ~16TB
-for 4KB pages.
-
-
-========
-MOUNTING
-========
-
-Since CacheFS is actually a quasi-filesystem, it requires a block device behind
-it. The way to give it one is to mount it as cachefs type on a directory
-somewhere. The mounted filesystem will then present the user with a set of
-directories outlining the index structure resident in the cache. Indexes
-(directories) and files can be turfed out of the cache by the sysadmin through
-the use of rmdir and unlink.
-
-For instance, if a cache contains AFS data, the user might see the following:
-
- root>mount -t cachefs /dev/hdg9 /cache-hdg9
- root>ls -1 /cache-hdg9
- afs
- root>ls -1 /cache-hdg9/afs
- cambridge.redhat.com
- root>ls -1 /cache-hdg9/afs/cambridge.redhat.com
- root.afs
- root.cell
-
-However, a block device that's going to be used for a cache must be prepared
-before it can be mounted initially. This is done very simply by:
-
- echo "cachefs___" >/dev/hdg9
-
-During the initial mount, the basic structure will be scribed into the cache,
-and then a background thread will "recycle" the as-yet unused data blocks.
-
-
-======================
-NETWORK FILESYSTEM API
-======================
-
-There is, of course, an API by which a network filesystem can make use of the
-CacheFS facilities. This is based around a number of principles:
-
- (1) Every file and index is represented by a cookie. This cookie may or may
- not have anything associated with it, but the netfs doesn't need to care.
-
- (2) Barring the top-level index (one entry per cached netfs), the index
- hierarchy for each netfs is structured according the whim of the netfs.
-
- (3) Any netfs page being backed by the cache must have a small token
- associated with it (possibly pointed to by page->private) so that CacheFS
- can keep track of it.
-
-This API is declared in <linux/cachefs.h>.
-
-
-NETWORK FILESYSTEM DEFINITION
------------------------------
-
-CacheFS needs a description of the network filesystem. This is specified using
-a record of the following structure:
-
- struct cachefs_netfs {
- const char *name;
- unsigned version;
- struct cachefs_netfs_operations *ops;
- struct cachefs_cookie *primary_index;
- ...
- };
-
-This first three fields should be filled in before registration, and the fourth
-will be filled in by the registration function; any other fields should just be
-ignored and are for internal use only.
-
-The fields are:
-
- (1) The name of the netfs (used as the key in the toplevel index).
-
- (2) The version of the netfs (if the name matches but the version doesn't, the
- entire on-disc hierarchy for this netfs will be scrapped and begun
- afresh).
-
- (3) The operations table is defined as follows:
-
- struct cachefs_netfs_operations {
- struct cachefs_page *(*get_page_cookie)(struct page *page);
- };
-
- The functions here must all be present. Currently the only one is:
-
- (a) get_page_cookie(): Get the token used to bind a page to a block in a
- cache. This function should allocate it if it doesn't exist.
-
- Return -ENOMEM if there's not enough memory and -ENODATA if the page
- just shouldn't be cached.
-
- Set *_page_cookie to point to the token and return 0 if there is now a
- cookie. Note that the netfs must keep track of the cookie itself (and
- free it later). page->private can be used for this (see below).
-
- (4) The cookie representing the primary index will be allocated according to
- another parameter passed into the registration function.
-
-For example, kAFS (linux/fs/afs/) uses the following definitions to describe
-itself:
-
- static struct cachefs_netfs_operations afs_cache_ops = {
- .get_page_cookie = afs_cache_get_page_cookie,
- };
-
- struct cachefs_netfs afs_cache_netfs = {
- .name = "afs",
- .version = 0,
- .ops = &afs_cache_ops,
- };
-
-
-INDEX DEFINITION
-----------------
-
-Indexes are used for two purposes:
-
- (1) To speed up the finding of a file based on a series of keys (such as AFS's
- "cell", "volume ID", "vnode ID").
-
- (2) To make it easier to discard a subset of all the files cached based around
- a particular key - for instance to mirror the removal of an AFS volume.
-
-However, since it's unlikely that any two netfs's are going to want to define
-their index hierarchies in quite the same way, CacheFS tries to impose as few
-restraints as possible on how an index is structured and where it is placed in
-the tree. The netfs can even mix indexes and data files at the same level, but
-it's not recommended.
-
-There are some limits on indexes:
-
- (1) All entries in any given index must be the same size. An array of such
- entries needn't fit exactly into a page, but they will be not laid across
- a page boundary.
-
- The netfs supplies a blob of data for each index entry, and CacheFS
- provides an inode number and a flag.
-
- (2) The entries in one index can be of a different size to the entries in
- another index.
-
- (3) The entry data must be journallable, and thus must be able to fit into an
- update journal entry - this limits the maximum size to a little over 400
- bytes at present.
-
- (4) The index data must start with the key. The layout of the key is described
- in the index definition, and this is used to display the key in some
- appropriate way.
-
- (5) The depth of the index tree should be judged with care as the search
- function is recursive. Too many layers will run the kernel out of stack.
-
-To define an index, a structure of the following type should be filled out:
-
- struct cachefs_index_def
- {
- uint8_t name[8];
- uint16_t data_size;
- struct {
- uint8_t type;
- uint16_t len;
- } keys[4];
-
- cachefs_match_val_t (*match)(void *target_netfs_data,
- const void *entry);
-
- void (*update)(void *source_netfs_data, void *entry);
- };
-
-This has the following fields:
-
- (1) The name of the index (NUL terminated unless all 8 chars are used).
-
- (2) The size of the data blob provided by the netfs.
-
- (3) A definition of the key(s) at the beginning of the blob. The netfs is
- permitted to specify up to four keys. The total length must not exceed the
- data size. It is assumed that the keys will be laid end to end in order,
- starting at the first byte of the data.
-
- The type field specifies the way the data should be displayed. It can be
- one of:
-
- (*) CACHEFS_INDEX_KEYS_NOTUSED - key field not used
- (*) CACHEFS_INDEX_KEYS_BIN - display byte-by-byte in hex
- (*) CACHEFS_INDEX_KEYS_ASCIIZ - NUL-terminated ASCII
- (*) CACHEFS_INDEX_KEYS_IPV4ADDR - display as IPv4 address
- (*) CACHEFS_INDEX_KEYS_IPV6ADDR - display as IPv6 address
-
- (4) A function to compare an in-page-cache index entry blob with the data
- passed to the cookie acquisition function. This function can also be used
- to extract data from the blob and copy it into the netfs's structures.
-
- The values this function can return are:
-
- (*) CACHEFS_MATCH_FAILED - failed to match
- (*) CACHEFS_MATCH_SUCCESS - successful match
- (*) CACHEFS_MATCH_SUCCESS_UPDATE - successful match, entry needs update
- (*) CACHEFS_MATCH_SUCCESS_DELETE - entry should be deleted
-
- For example, in linux/fs/afs/vnode.c:
-
- static cachefs_match_val_t
- afs_vnode_cache_match(void *target, const void *entry)
- {
- const struct afs_cache_vnode *cvnode = entry;
- struct afs_vnode *vnode = target;
-
- if (vnode->fid.vnode != cvnode->vnode_id)
- return CACHEFS_MATCH_FAILED;
-
- if (vnode->fid.unique != cvnode->vnode_unique ||
- vnode->status.version != cvnode->data_version)
- return CACHEFS_MATCH_SUCCESS_DELETE;
-
- return CACHEFS_MATCH_SUCCESS;
- }
-
- (5) A function to initialise or update an in-page-cache index entry blob from
- netfs data passed to CacheFS by the netfs. This function should not assume
- that there's any data yet in the in-page-cache.
-
- Continuing the above example:
-
- static void afs_vnode_cache_update(void *source, void *entry)
- {
- struct afs_cache_vnode *cvnode = entry;
- struct afs_vnode *vnode = source;
-
- cvnode->vnode_id = vnode->fid.vnode;
- cvnode->vnode_unique = vnode->fid.unique;
- cvnode->data_version = vnode->status.version;
- }
-
-To finish the above example, the index definition for the "vnode" level is as
-follows:
-
- struct cachefs_index_def afs_vnode_cache_index_def = {
- .name = "vnode",
- .data_size = sizeof(struct afs_cache_vnode),
- .keys[0] = { CACHEFS_INDEX_KEYS_BIN, 4 },
- .match = afs_vnode_cache_match,
- .update = afs_vnode_cache_update,
- };
-
-The first element of struct afs_cache_vnode is the vnode ID.
-
-And for contrast, the cell index definition is:
-
- struct cachefs_index_def afs_cache_cell_index_def = {
- .name = "cell_ix",
- .data_size = sizeof(afs_cell_t),
- .keys[0] = { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
- .match = afs_cell_cache_match,
- .update = afs_cell_cache_update,
- };
-
-The cell index is the primary index for kAFS.
-
-
-NETWORK FILESYSTEM (UN)REGISTRATION
------------------------------------
-
-The first step is to declare the network filesystem to the cache. This also
-involves specifying the layout of the primary index (for AFS, this would be the
-"cell" level).
-
-The registration function is:
-
- int cachefs_register_netfs(struct cachefs_netfs *netfs,
- struct cachefs_index_def *primary_idef);
-
-It just takes pointers to the netfs definition and the primary index
-definition. It returns 0 or an error as appropriate.
-
-For kAFS, registration is done as follows:
-
- ret = cachefs_register_netfs(&afs_cache_netfs,
- &afs_cache_cell_index_def);
-
-The last step is, of course, unregistration:
-
- void cachefs_unregister_netfs(struct cachefs_netfs *netfs);
-
-
-INDEX REGISTRATION
-------------------
-
-The second step is to inform cachefs about part of an index hierarchy that can
-be used to locate files. This is done by requesting a cookie for each index in
-the path to the file:
-
- struct cachefs_cookie *
- cachefs_acquire_cookie(struct cachefs_cookie *iparent,
- struct cachefs_index_def *idef,
- void *netfs_data);
-
-This function creates an index entry in the index represented by iparent,
-loading the associated blob by calling iparent's update method with the
-supplied netfs_data.
-
-It also creates a new index inode, formatted according to the definition
-supplied in idef. The new cookie is then returned in *_cookie.
-
-Note that this function never returns an error - all errors are handled
-internally. It may also return CACHEFS_NEGATIVE_COOKIE. It is quite acceptable
-to pass this token back to this function as iparent (or even to the relinquish
-cookie, read page and write page functions - see below).
-
-Note also that no indexes are actually created on disc until a data file needs
-to be created somewhere down the hierarchy. Furthermore, an index may be
-created in several different caches independently at different times. This is
-all handled transparently, and the netfs doesn't see any of it.
-
-For example, with AFS, a cell would be added to the primary index. This index
-entry would have a dependent inode containing a volume location index for the
-volume mappings within this cell:
-
- cell->cache =
- cachefs_acquire_cookie(afs_cache_netfs.primary_index,
- &afs_vlocation_cache_index_def,
- cell);
-
-Then when a volume location was accessed, it would be entered into the cell's
-index and an inode would be allocated that acts as a volume type and hash chain
-combination:
-
- vlocation->cache =
- cachefs_acquire_cookie(cell->cache,
- &afs_volume_cache_index_def,
- vlocation);
-
-And then a particular flavour of volume (R/O for example) could be added to
-that index, creating another index for vnodes (AFS inode equivalents):
-
- volume->cache =
- cachefs_acquire_cookie(vlocation->cache,
- &afs_vnode_cache_index_def,
- volume);
-
-
-DATA FILE REGISTRATION
-----------------------
-
-The third step is to request a data file be created in the cache. This is
-almost identical to index cookie acquisition. The only difference is that a
-NULL index definition is passed.
-
- vnode->cache =
- cachefs_acquire_cookie(volume->cache,
- NULL,
- vnode);
-
-
-
-PAGE ALLOC/READ/WRITE
----------------------
-
-And the fourth step is to propose a page be cached. There are two functions
-that are used to do this.
-
-Firstly, the netfs should ask CacheFS to examine the caches and read the
-contents cached for a particular page of a particular file if present, or else
-allocate space to store the contents if not:
-
- typedef
- void (*cachefs_rw_complete_t)(void *cookie_data,
- struct page *page,
- void *end_io_data,
- int error);
-
- int cachefs_read_or_alloc_page(struct cachefs_cookie *cookie,
- struct page *page,
- cachefs_rw_complete_t end_io_func,
- void *end_io_data,
- unsigned long gfp);
-
-The cookie argument must specify a data file cookie, the page specified will
-have the data loaded into it (and is also used to specify the page number), and
-the gfp argument is used to control how any memory allocations made are satisfied.
-
-If the cookie indicates the inode is not cached:
-
- (1) The function will return -ENOBUFS.
-
-Else if there's a copy of the page resident on disc:
-
- (1) The function will submit a request to read the data off the disc directly
- into the page specified.
-
- (2) The function will return 0.
-
- (3) When the read is complete, end_io_func() will be invoked with:
-
- (*) The netfs data supplied when the cookie was created.
-
- (*) The page descriptor.
-
- (*) The data passed to the above function.
-
- (*) An argument that's 0 on success or negative for an error.
-
- If an error occurs, it should be assumed that the page contains no usable
- data.
-
-Otherwise, if there's not a copy available on disc:
-
- (1) A block may be allocated in the cache and attached to the inode at the
- appropriate place.
-
- (2) The validity journal will be marked to indicate this page does not yet
- contain valid data.
-
- (3) The function will return -ENODATA.
-
-
-Secondly, if the netfs changes the contents of the page (either due to an
-initial download or if a user performs a write), then the page should be
-written back to the cache:
-
- int cachefs_write_page(struct cachefs_cookie *cookie,
- struct page *page,
- cachefs_rw_complete_t end_io_func,
- void *end_io_data,
- unsigned long gfp);
-
-The cookie argument must specify a data file cookie, the page specified should
-contain the data to be written (and is also used to specify the page number),
-and the gfp argument is used to control how any memory allocations made are
-satisfied.
-
-If the cookie indicates the inode is not cached then:
-
- (1) The function will return -ENOBUFS.
-
-Else if there's a block allocated on disc to hold this page:
-
- (1) The function will submit a request to write the data to the disc directly
- from the page specified.
-
- (2) The function will return 0.
-
- (3) When the write is complete:
-
- (a) Any associated validity journal entry will be cleared (the block now
- contains valid data as far as CacheFS is concerned).
-
- (b) end_io_func() will be invoked with:
-
- (*) The netfs data supplied when the cookie was created.
-
- (*) The page descriptor.
-
- (*) The data passed to the above function.
-
- (*) An argument that's 0 on success or negative for an error.
-
- If an error happens, it can be assumed that the page has been
- discarded from the cache.
-
-
-PAGE UNCACHING
---------------
-
-To uncache a page, this function should be called:
-
- void cachefs_uncache_page(struct cachefs_cookie *cookie,
- struct page *page);
-
-This detaches the page specified from the data file indicated by the cookie and
-unbinds it from the underlying block.
-
-Note that pages can't be explicitly detached from the a data file. The whole
-data file must be retired (see the relinquish cookie function below).
-
-Furthermore, note that this does not cancel the asynchronous read or write
-operation started by the read/alloc and write functions.
-
-
-INDEX AND DATA FILE UPDATE
---------------------------
-
-To request an update of the index data for an index or data file, the following
-function should be called:
-
- void cachefs_update_cookie(struct cachefs_cookie *cookie);
-
-This function will refer back to the netfs_data pointer stored in the cookie by
-the acquisition function to obtain the data to write into each revised index
-entry. The update method in the parent index definition will be called to
-transfer the data.
-
-
-INDEX AND DATA FILE UNREGISTRATION
-----------------------------------
-
-To get rid of a cookie, this function should be called.
-
- void cachefs_relinquish_cookie(struct cachefs_cookie *cookie,
- int retire);
-
-If retire is non-zero, then the index or file will be marked for recycling, and
-all copies of it will be removed from all active caches in which it is present.
-
-If retire is zero, then the inode may be available again next the the
-acquisition function is called.
-
-One very important note - relinquish must NOT be called unless all "child"
-indexes, files and pages have been relinquished first.
-
-
-PAGE TOKEN MANAGEMENT
----------------------
-
-As previously mentioned, the netfs must keep a token associated with each page
-currently actively backed by the cache. This is used by CacheFS to go from a
-page to the internal representation of the underlying block and back again. It
-is particularly important for managing the withdrawal of a cache whilst it is
-in active service (eg: it got unmounted).
-
-The token is this:
-
- struct cachefs_page {
- ...
- };
-
-Note that all fields are for internal CacheFS use only.
-
-The token only needs to be allocated when CacheFS asks for it. This it will do
-by calling the get_page_cookie() method in the netfs definition ops table. Once
-allocated, the same token should be presented every time the method is called
-again for a particular page.
-
-The token should be retained by the netfs, and should be deleted only after the
-page has been uncached.
-
-One way to achieve this is to attach the token to page->private (and set the
-PG_private bit on the page) once allocated. Shortcut routines are provided by
-CacheFS to do this. Firstly, to retrieve if present and allocate if not:
-
- struct cachefs_page *cachefs_page_get_private(struct page *page,
- unsigned gfp);
-
-Secondly to retrieve if present and BUG if not:
-
- static inline
- struct cachefs_page *cachefs_page_grab_private(struct page *page);
-
-To clean up the tokens, the netfs inode hosting the page should be provided
-with address space operations that circumvent the buffer-head operations for a
-page. For instance:
-
- struct address_space_operations afs_fs_aops = {
- ...
- .sync_page = block_sync_page,
- .set_page_dirty = __set_page_dirty_nobuffers,
- .releasepage = afs_file_releasepage,
- .invalidatepage = afs_file_invalidatepage,
- };
-
- static int afs_file_invalidatepage(struct page *page,
- unsigned long offset)
- {
- struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
- int ret = 1;
-
- BUG_ON(!PageLocked(page));
- if (!PagePrivate(page))
- return 1;
- cachefs_uncache_page(vnode->cache,page);
- if (offset == 0)
- return 1;
- BUG_ON(!PageLocked(page));
- if (PageWriteback(page))
- return 0;
- return page->mapping->a_ops->releasepage(page, 0);
- }
-
- static int afs_file_releasepage(struct page *page, int gfp_flags)
- {
- struct cachefs_page *token;
- struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
-
- if (PagePrivate(page)) {
- cachefs_uncache_page(vnode->cache, page);
- token = (struct cachefs_page *) page->private;
- page->private = 0;
- ClearPagePrivate(page);
- if (token)
- kfree(token);
- }
- return 0;
- }
-
-
-INDEX AND DATA FILE INVALIDATION
---------------------------------
-
-There is no direct way to invalidate an index subtree or a data file. To do
-this, the caller should relinquish and retire the cookie they have, and then
-acquire a new one.
diff -uNrp linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/backend-api.txt linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/backend-api.txt
--- linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/backend-api.txt 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/backend-api.txt 2004-10-05 13:21:09.000000000 +0100
@@ -0,0 +1,317 @@
+ ==========================
+ FS-CACHE CACHE BACKEND API
+ ==========================
+
+The FS-Cache system provides an API by which actual caches can be supplied to
+FS-Cache for it to then serve out to network filesystems and other interested
+parties.:
+
+This API is declared in <linux/fscache-cache.h>.
+
+
+====================================
+INITIALISING AND REGISTERING A CACHE
+====================================
+
+To start off, a cache definition must be initialised and registered for each
+cache the backend wants to make available. For instance, CacheFS does this in
+the fill_super() operation on mounting.
+
+The cache definition (struct fscache_cache) should be initialised by calling:
+
+ void fscache_init_cache(struct fscache_cache *cache,
+ struct fscache_cache_ops *ops,
+ unsigned fsdef_ino,
+ const char *idfmt,
+ ...)
+
+Where:
+
+ (*) "cache" is a pointer to the cache definition;
+
+ (*) "ops" is a pointer to the table of operations that the backend supports on
+ this cache;
+
+ (*) "fsdef_ino" is the reference number of the FileSystem DEFinition index
+ (the top-level index), which in CacheFS is its inode number;
+
+ (*) and a format and printf-style arguments for constructing a label for the
+ cache.
+
+
+The cache should then be registered with FS-Cache by passing a pointer to the
+previously initialised cache definition to:
+
+ void fscache_add_cache(struct fscache_cache *cache)
+
+
+=====================
+UNREGISTERING A CACHE
+=====================
+
+A cache can be withdrawn from the system by calling this function with a
+pointer to the cache definition:
+
+ void fscache_withdraw_cache(struct fscache_cache *cache)
+
+In CacheFS's case, this is called by put_super().
+
+It is possible to check to see if a cache has been withdrawn by calling:
+
+ int fscache_is_cache_withdrawn(struct fscache_cache *cache)
+
+Which will return non-zero if it has been, zero if it is still active.
+
+
+==================
+FS-CACHE UTILITIES
+==================
+
+FS-Cache provides some utilities that a cache backend may make use of:
+
+ (*) Find parent of node.
+
+ struct fscache_node *fscache_find_parent_node(struct fscache_node *node)
+
+ This allows a backend to find the logical parent of an index or data file
+ in the cache hierarchy.
+
+ (*) Allocate a page token.
+
+ struct fscache_page *fscache_page_get_private(struct page *page,
+ unsigned gfp);
+
+ If the page has a page token attached, then this is returned by this
+ function. If it doesn't have one, then a page token is allocated with the
+ specified allocation flags and attached to the page's private value. The
+ error ENOMEM is returned if there's no memory available.
+
+ (*) Grab an existing page token.
+
+ struct fscache_page *fscache_page_grab_private(struct page *page)
+
+ This function returns a pointer to the page token attached to the page's
+ private value if it exists, and BUG's if it does not.
+
+
+========================
+RELEVANT DATA STRUCTURES
+========================
+
+ (*) Index/Data file FS-Cache representation cookie.
+
+ struct fscache_cookie {
+ struct fscache_index_def *idef;
+ struct fscache_netfs *netfs;
+ void *netfs_data;
+ ...
+ };
+
+ The fields that might be of use to the backend describe the index
+ definition (indexes only), the netfs definition and the netfs's data for
+ this cookie. The index definition contains a number of functions supplied
+ by the netfs for matching index entries; these are required to provide
+ some of the cache operations.
+
+ (*) Cached search result.
+
+ struct fscache_search_result {
+ unsigned ino;
+ ...
+ };
+
+ This is used by FS-Cache to keep track of what nodes it has found in what
+ caches. Some of the cache operations set the "cache node number" held
+ therein.
+
+ (*) In-cache node representation.
+
+ struct fscache_node {
+ struct fscache_cookie *cookie;
+ unsigned long flags;
+ #define FSCACHE_NODE_ISINDEX 0
+ ...
+ };
+
+ Structures of this type should be allocated by the cache backend and
+ passed to FS-Cache when requested by the appropriate cache operation. In
+ the case of CacheFS, they're embedded in CacheFS's inode structure.
+
+ Each node contains a pointer to the cookie that represents the index or
+ data file it is backing. It also contains a flag that indicates whether
+ this is an index or not. This should be initialised by calling
+ fscache_node_init(node).
+
+ (*) Filesystem definition (FSDEF) index entry representation.
+
+ struct fscache_fsdef_index_entry {
+ uint8_t name[24]; /* name of netfs */
+ uint32_t version; /* version of layout */
+ };
+
+ This structure defines the layout of the data in the FSDEF index
+ maintained by the FS-Cache facility for distinguishing between the caches
+ for separate netfs's.
+
+
+================
+CACHE OPERATIONS
+================
+
+The cache backend provides FS-Cache with a table of operations that can be
+performed on the denizens of the cache. These are held in a structure of type
+
+ struct fscache_cache_ops
+
+ (*) Name of cache provider [mandatory].
+
+ const char *name
+
+ This isn't strictly an operation, but should be pointed at a string naming
+ the backend.
+
+ (*) Node lookup [mandatory].
+
+ struct fscache_node *(*lookup_node)(struct fscache_cache *cache,
+ unsigned ino)
+
+ This method is used to turn a logical cache node number into a handle on a
+ represention of that node.
+
+ (*) Increment node refcount [mandatory].
+
+ struct fscache_node *(*grab_node)(struct fscache_node *node)
+
+ This method is called to increment the reference count on a node. It may
+ fail (for instance if the cache is being withdrawn).
+
+ (*) Lock/Unlock node [mandatory].
+
+ void (*lock_node)(struct fscache_node *node)
+ void (*unlock_node)(struct fscache_node *node)
+
+ These methods are used to exclusively lock a node. It must be possible to
+ schedule with the lock held, so a spinlock isn't sufficient.
+
+ (*) Unreference node [mandatory].
+
+ void (*put_node)(struct fscache_node *node)
+
+ This method is used to discard a reference to a node. The node may be
+ destroyed when all the references held by FS-Cache are released.
+
+ (*) Search an index [mandatory].
+
+ int (*index_search)(struct fscache_node *index,
+ struct fscache_cookie *cookie,
+ struct fscache_search_result *result)
+
+ This method is called to search an index for a node that matches the
+ criteria attached to the cookie (cookie->netfs_data). This should be
+ matched by calling index->cookie->idef->match().
+
+ The cache backend is responsible for dealing with the match result,
+ including updating or discarding existing index entries. An index entry
+ can be updated by calling index->cookie->idef->update().
+
+ If the search is successful, the node number should be stored in
+ result->ino and zero returned. If not successful, error ENOENT should be
+ returned if no entry was found, or some other error otherwise.
+
+ (*) Create a new node [mandatory].
+
+ int (*index_add)(struct fscache_node *index,
+ struct fscache_cookie *cookie,
+ struct fscache_search_result *result)
+
+ This method is called to create a new node on disc and add an entry for it
+ to the specified index. The index entry for the new node should be
+ obtained by calling index->cookie->idef->update() and passing it the
+ argument cookie.
+
+ If successful, the node number should be stored in result->ino and zero
+ should be returned.
+
+ (*) Update a node [mandatory].
+
+ int (*index_update)(struct fscache_node *index,
+ struct fscache_node *node)
+
+ This is called to update the on-disc index entry for the specified
+ node. The new information should be in node->cookie->netfs_data. This can
+ be obtained by calling index->cookie->idef->update() and passing it
+ node->cookie.
+
+ (*) Synchronise a cache to disc [mandatory].
+
+ void (*sync)(struct fscache_cache *cache)
+
+ This is called to ask the backend to synchronise a cache with disc.
+
+ (*) Dissociate a cache [mandatory].
+
+ void (*dissociate_pages)(struct fscache_cache *cache)
+
+ This is called to ask the cache to dissociate all netfs pages from
+ mappings to disc. It is assumed that the backend cache will have some way
+ of finding all the page tokens that refer to its own blocks.
+
+ (*) Request page be read from cache [mandatory].
+
+ int (*read_or_alloc_page)(struct fscache_node *node,
+ struct page *page,
+ struct fscache_page *pageio,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ unsigned long gfp)
+
+ This is called to attempt to read a netfs page from disc, or to allocate a
+ backing block if not. FS-Cache will have done as much checking as it can
+ before calling, but most of the work belongs to the backend.
+
+ If there's no page on disc, then -ENODATA should be returned if the
+ backend managed to allocate a backing block; -ENOBUFS or -ENOMEM if it
+ didn't.
+
+ If there is a page on disc, then a read operation should be queued and 0
+ returned. When the read finishes, end_io_func() should be called with the
+ following arguments:
+
+ (*end_io_func)(node->cookie->netfs_data,
+ page,
+ end_io_data,
+ error);
+
+ (*) Request page be written to cache [mandatory].
+
+ int (*write_page)(struct fscache_node *node,
+ struct page *page,
+ struct fscache_page *pageio,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ unsigned long gfp)
+
+ This is called to write from a page on which there was a previously
+ successful read_or_alloc_page() call. FS-Cache filters out pages that
+ don't have mappings.
+
+ If there's no block on disc available, then -ENOBUFS should be returned
+ (or -ENOMEM if there wasn't any memory to be had).
+
+ If the write operation could be queued, then 0 should be returned. When
+ the write completes, end_io_func() should be called with the following
+ arguments:
+
+ (*end_io_func)(node->cookie->netfs_data,
+ page,
+ end_io_data,
+ error);
+
+ (*) Discard mapping [mandatory].
+
+ void (*uncache_page)(struct fscache_node *node,
+ struct fscache_page *page_token)
+
+ This is called when a page is being booted from the pagecache. The cache
+ backend needs to break the links between the page token and whatever
+ internal representations it maintains.
diff -uNrp linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/cachefs.txt linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/cachefs.txt
--- linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/cachefs.txt 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/cachefs.txt 2004-10-05 11:22:27.000000000 +0100
@@ -0,0 +1,274 @@
+ ===========================
+ CacheFS: Caching Filesystem
+ ===========================
+
+========
+OVERVIEW
+========
+
+CacheFS is a backend for the general filesystem cache facility.
+
+CacheFS uses a block device directly rather than a bunch of files under an
+already mounted filesystem. For why this is so, see further on. If necessary,
+however, a file can be loopback mounted as a cache.
+
+
+CacheFS provides the following facilities:
+
+ (1) More than one block device can be mounted as a cache.
+
+ (2) Caches can be mounted / unmounted at any time.
+
+ (3) All metadata modifications (this includes index contents) are performed
+ as journalled transactions. These are replayed on mounting.
+
+
+=============================================
+WHY A BLOCK DEVICE? WHY NOT A BUNCH OF FILES?
+=============================================
+
+CacheFS is backed by a block device rather than being backed by a bunch of
+files on a filesystem. This confers several advantages:
+
+ (1) Performance.
+
+ Going directly to a block device means that we can DMA directly to/from
+ the the netfs's pages. If another filesystem was managing the backing
+ store, everything would have to be copied between pages. Whilst DirectIO
+ does exist, it doesn't appear easy to make use of in this situation.
+
+ New address space or file operations could be added to make it possible to
+ persuade a backing discfs to generate block I/O directly to/from disc
+ blocks under its control, but that then means the discfs has to keep track
+ of I/O requests to pages not under its control.
+
+ Furthermore, we only have to do one lot of readahead calculations, not
+ two; in the discfs backing case, the netfs would do one and the discfs
+ would do one.
+
+ (2) Memory.
+
+ Using a block device means that we have a lower memory usage - all data
+ pages belong to the netfs we're backing. If we used a filesystem, we would
+ have twice as many pages at certain points - one from the netfs and one
+ from the backing discfs. In the backing discfs model, under situations of
+ memory pressure, we'd have to allocate or keep around a discfs page to be
+ able to write out a netfs page; or else we'd need to be able to punch a
+ hole in the backing file.
+
+ Furthermore, whilst we have to keep a CacheFS inode around in memory for
+ every netfs inode we're backing, a backing discfs would have to keep the
+ dentry and possibly a file struct too.
+
+ (3) Holes.
+
+ The cache uses holes to indicate to the netfs that it hasn't yet
+ downloaded the data for that page.
+
+ Since CacheFS is its own filesystem, it can support holes in files
+ trivially. Running on top of another discfs would limit us to using ones
+ that can support holes.
+
+ Furthermore, it would have to be made possible to detect holes in a discfs
+ file, rather than just seeing zero filled blocks.
+
+ (4) Data Consistency.
+
+ Cachefs uses a pair of journals to keep track of the state of the cache
+ and all the pages contained therein. This means that it doesn't get into
+ an inconsistent state in the on-disc cache and it doesn't lose disc space.
+
+ CacheFS takes especial care between the allocation of a block and its
+ splicing into the on-disc pointer tree, and the data having been written
+ to disc. If power is interrupted and then restored, the journals are
+ replayed and if it is seen that a block was allocated but not written it
+ is then punched out. Being backed by a discfs, I'm not certain what will
+ happen. It may well be possible to mark a discfs's journal, if it has one,
+ but how does the discfs deal with those marks? This also limits consistent
+ caching to running on journalled discfs's where there's a function to
+ write extraordinary marks into the journal.
+
+ The alternative would be to keep flags in the superblock, and to
+ re-initialise the cache if it wasn't cleanly unmounted.
+
+ Knowing that your cache is in a good state is vitally important if you,
+ say, put /usr on AFS. Some organisations put everything barring /etc,
+ /sbin, /lib and /var on AFS and have an enormous cache on every
+ computer. Imagine if the power goes out and renders every cache
+ inconsistent, requiring all the computers to re-initialise their caches
+ when the power comes back on...
+
+ (5) Recycling.
+
+ Recycling is simple on CacheFS. It can just scan the metadata index to
+ look for inodes that require reclamation/recycling; and it can also build
+ up a list of the least recently used inodes so that they can be reclaimed
+ later to make space.
+
+ Doing this on a discfs would require a search going down through a nest
+ of directories, and would probably have to be done in userspace.
+
+ (6) Disc Space.
+
+ Whilst the block device does set a hard ceiling on the amount of space
+ available, CacheFS can guarantee that all that space will be available to
+ the cache. On a discfs-backed cache, the administrator would probably want
+ to set a cache size limit, but the system wouldn't be able guarantee that
+ all that space would be available to the cache - not unless that cache was
+ on a partition of its own.
+
+ Furthermore, with a discfs-backed cache, if the recycler starts to reclaim
+ cache files to make space, the freed blocks may just be eaten directly by
+ userspace programs, potentially resulting in the entire cache being
+ consumed. Alternatively, netfs operations may end up being held up because
+ the cache can't get blocks on which to store the data.
+
+ (7) Users.
+
+ Users can't so easily go into CacheFS and run amok. The worst they can do
+ is cause bits of the cache to be recycled early. With a discfs-backed
+ cache, they can do all sorts of bad things to the files belonging to the
+ cache, and they can do this quite by accident.
+
+
+On the other hand, there would be some advantages to using a file-based cache
+rather than a blockdev-based cache:
+
+ (1) Having to copy to a discfs's page would mean that a netfs could just make
+ the copy and then assume its own page is ready to go.
+
+ (2) Backing onto a discfs wouldn't require a committed block device. You would
+ just nominate a directory and go from there. With CacheFS you have to
+ repartition or install an extra drive to make use of it in an existing
+ system (though the loopback device offers a way out).
+
+ (3) CacheFS requires the netfs to store a key in any pertinent index entry,
+ and it also permits a limited amount arbitrary data to be stored there.
+
+ A discfs could be requested to store the netfs's data in xattrs, and the
+ filename could be used to store the key, though the key would have to be
+ rendered as text not binary. Likewise indexes could be rendered as
+ directories with xattrs.
+
+ (4) You could easily make your cache bigger if the discfs has plenty of space,
+ you could even go across multiple mountpoints.
+
+
+======================
+GENERAL ON-DISC LAYOUT
+======================
+
+The filesystem is divided into a number of parts:
+
+ 0 +---------------------------+
+ | Superblock |
+ 1 +---------------------------+
+ | Update Journal |
+ +---------------------------+
+ | Validity Journal |
+ +---------------------------+
+ | Write-Back Journal |
+ +---------------------------+
+ | |
+ | Data |
+ | |
+ END +---------------------------+
+
+The superblock contains the filesystem ID tags and pointers to all the other
+regions.
+
+The update journal consists of a set of entries of sector size that keep track
+of what changes have been made to the on-disc filesystem, but not yet
+committed.
+
+The validity journal contains records of data blocks that have been allocated
+but not yet written. Upon journal replay, all these blocks will be detached
+from their pointers and recycled.
+
+The writeback journal keeps track of changes that have been made locally to
+data blocks, but that have not yet been committed back to the server. This is
+not yet implemented.
+
+The journals are replayed upon mounting to make sure that the cache is in a
+reasonable state.
+
+The data region holds a number of things:
+
+ (1) Index Files
+
+ These are files of entries used by CacheFS internally and by filesystems
+ that wish to cache data here (such as AFS) to keep track of what's in
+ the cache at any given time.
+
+ The first index file (inode 1) is special. It holds the CacheFS-specific
+ metadata for every file in the cache (including direct, single-indirect
+ and double-indirect block pointers).
+
+ The second index file (inode 2) is also special. It has an entry for
+ each filesystem that's currently holding data in this cache.
+
+ Every allocated entry in an index has an inode bound to it. This inode is
+ either another index file or it is a data file.
+
+ (2) Cached Data Files
+
+ These are caches of files from remote servers. Holes in these files
+ represent blocks not yet obtained from the server.
+
+ (3) Indirection Blocks
+
+ Should a file have more blocks than can be pointed to by the few
+ pointers in its storage management record, then indirection blocks will
+ be used to point to further data or indirection blocks.
+
+ Three levels of indirection are currently supported:
+
+ - single indirection
+ - double indirection
+
+ (4) Allocation Nodes and Free Blocks
+
+ The free blocks of the filesystem are kept in two single-branched
+ "trees". One tree is the blocks that are ready to be allocated, and the
+ other is the blocks that have just been recycled. When the former tree
+ becomes empty, the latter tree is decanted across.
+
+ Each tree is arranged as a chain of "nodes", each node points to the next
+ node in the chain (unless it's at the end) and also up to 1022 free
+ blocks.
+
+Note that all blocks are PAGE_SIZE in size. The blocks are numbered starting
+with the superblock at 0. Using 32-bit block pointers, a maximum number of
+0xffffffff blocks can be accessed, meaning that the maximum cache size is ~16TB
+for 4KB pages.
+
+
+========
+MOUNTING
+========
+
+Since CacheFS is actually a quasi-filesystem, it requires a block device behind
+it. The way to give it one is to mount it as cachefs type on a directory
+somewhere. The mounted filesystem will then present the user with a set of
+directories outlining the index structure resident in the cache. Indexes
+(directories) and files can be turfed out of the cache by the sysadmin through
+the use of rmdir and unlink.
+
+For instance, if a cache contains AFS data, the user might see the following:
+
+ root>mount -t cachefs /dev/hdg9 /cache-hdg9
+ root>ls -1 /cache-hdg9
+ afs
+ root>ls -1 /cache-hdg9/afs
+ cambridge.redhat.com
+ root>ls -1 /cache-hdg9/afs/cambridge.redhat.com
+ root.afs
+ root.cell
+
+However, a block device that's going to be used for a cache must be prepared
+before it can be mounted initially. This is done very simply by:
+
+ echo "cachefs___" >/dev/hdg9
+
+During the initial mount, the basic structure will be scribed into the cache,
+and then a background thread will "recycle" the as-yet unused data blocks.
diff -uNrp linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/fscache.txt linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/fscache.txt
--- linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/fscache.txt 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/fscache.txt 2004-10-05 11:22:27.000000000 +0100
@@ -0,0 +1,94 @@
+ ==========================
+ General Filesystem Caching
+ ==========================
+
+========
+OVERVIEW
+========
+
+This facility is a general purpose cache for network filesystems, though it
+could be used for caching other things such as ISO9660 filesystems too.
+
+FS-Cache mediates between cache backends (such as CacheFS) and network
+filesystems:
+
+ +---------+
+ | | +-----------+
+ | NFS |--+ | |
+ | | | +-->| CacheFS |
+ +---------+ | +----------+ | | /dev/hda5 |
+ | | | | +-----------+
+ +---------+ +-->| | |
+ | | | |--+ +-------------+
+ | AFS |----->| FS-Cache | | |
+ | | | |----->| Cache Files |
+ +---------+ +-->| | | /var/cache |
+ | | |--+ +-------------+
+ +---------+ | +----------+ |
+ | | | | +-------------+
+ | ISOFS |--+ | | |
+ | | +-->| ReiserCache |
+ +---------+ | / |
+ +-------------+
+
+FS-Cache does not follow the idea of completely loading every netfs file
+opened in its entirety into a cache before permitting it to be accessed and
+then serving the pages out of that cache rather than the netfs inode because:
+
+ (1) It must be practical to operate without a cache.
+
+ (2) The size of any accessible file must not be limited to the size of the
+ cache.
+
+ (3) The combined size of all opened files (this includes mapped libraries)
+ must not be limited to the size of the cache.
+
+ (4) The user should not be forced to download an entire file just to do a
+ one-off access of a small portion of it (such as might be done with the
+ "file" program).
+
+It instead serves the cache out in PAGE_SIZE chunks as and when requested by
+the netfs('s) using it.
+
+
+FS-Cache provides the following facilities:
+
+ (1) More than one cache can be used at once.
+
+ (2) Caches can be added / removed at any time.
+
+ (3) The netfs is provided with an interface that allows either party to
+ withdraw caching facilities from a file (required for (2)).
+
+ (4) The interface to the netfs returns as few errors as possible, preferring
+ rather to let the netfs remain oblivious.
+
+ (5) Cookies are used to represent files and indexes to the netfs. The simplest
+ cookie is just a NULL pointer - indicating nothing cached there.
+
+ (6) The netfs is allowed to propose - dynamically - any index hierarchy it
+ desires, though it must be aware that the index search function is
+ recursive and stack space is limited.
+
+ (7) Data I/O is done direct to and from the netfs's pages. The netfs indicates
+ that page A is at index B of the data-file represented by cookie C, and
+ that it should be read or written. The cache backend may or may not start
+ I/O on that page, but if it does, a netfs callback will be invoked to
+ indicate completion. The I/O may be either synchronous or asynchronous.
+
+ (8) Cookies can be "retired" upon release. At this point FS-Cache will mark
+ them as obsolete and the index hierarchy rooted at that point will get
+ recycled.
+
+ (9) The netfs provides a "match" function for index searches. In addition to
+ saying whether a match was made or not, this can also specify that an
+ entry should be updated or deleted.
+
+
+The netfs API to FS-Cache can be found in:
+
+ Documentation/filesystems/caching/netfs-api.txt
+
+The cache backend API to FS-Cache can be found in:
+
+ Documentation/filesystems/caching/backend-api.txt
diff -uNrp linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/netfs-api.txt linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/netfs-api.txt
--- linux-2.6.9-rc3-mm2/Documentation/filesystems/caching/netfs-api.txt 1970-01-01 01:00:00.000000000 +0100
+++ linux-2.6.9-rc3-mm2-fscache/Documentation/filesystems/caching/netfs-api.txt 2004-10-06 13:31:13.000000000 +0100
@@ -0,0 +1,583 @@
+ ===============================
+ FS-CACHE NETWORK FILESYSTEM API
+ ===============================
+
+There's an API by which a network filesystem can make use of the FS-Cache
+facilities. This is based around a number of principles:
+
+ (1) Every file and index is represented by a cookie. This cookie may or may
+ not have anything associated with it, but the netfs doesn't need to care.
+
+ (2) Barring the top-level index (one entry per cached netfs), the index
+ hierarchy for each netfs is structured according the whim of the netfs.
+
+ (3) Any netfs page being backed by the cache must have a small token
+ associated with it (possibly pointed to by page->private) so that FS-Cache
+ can keep track of it.
+
+This API is declared in <linux/fscache.h>.
+
+
+=============================
+NETWORK FILESYSTEM DEFINITION
+=============================
+
+FS-Cache needs a description of the network filesystem. This is specified using
+a record of the following structure:
+
+ struct fscache_netfs {
+ const char *name;
+ unsigned version;
+ struct fscache_netfs_operations *ops;
+ struct fscache_cookie *primary_index;
+ ...
+ };
+
+This first three fields should be filled in before registration, and the fourth
+will be filled in by the registration function; any other fields should just be
+ignored and are for internal use only.
+
+The fields are:
+
+ (1) The name of the netfs (used as the key in the toplevel index).
+
+ (2) The version of the netfs (if the name matches but the version doesn't, the
+ entire on-disc hierarchy for this netfs will be scrapped and begun
+ afresh).
+
+ (3) The operations table is defined as follows:
+
+ struct fscache_netfs_operations {
+ struct fscache_page *(*get_page_cookie)(struct page *page);
+ };
+
+ The functions here must all be present. Currently the only one is:
+
+ (a) get_page_token(): Get the token used to bind a page to a block in a
+ cache. This function should allocate it if it doesn't exist.
+
+ Return -ENOMEM if there's not enough memory and -ENODATA if the page
+ just shouldn't be cached.
+
+ Set *_page_token to point to the token and return 0 if there is now a
+ token. Note that the netfs must keep track of the token itself (and
+ free it later). page->private can be used for this (see below).
+
+ (4) The cookie representing the primary index will be allocated according to
+ another parameter passed into the registration function.
+
+For example, kAFS (linux/fs/afs/) uses the following definitions to describe
+itself:
+
+ static struct fscache_netfs_operations afs_cache_ops = {
+ .get_page_token = afs_cache_get_page_token,
+ };
+
+ struct fscache_netfs afs_cache_netfs = {
+ .name = "afs",
+ .version = 0,
+ .ops = &afs_cache_ops,
+ };
+
+
+================
+INDEX DEFINITION
+================
+
+Indexes are used for two purposes:
+
+ (1) To speed up the finding of a file based on a series of keys (such as AFS's
+ "cell", "volume ID", "vnode ID").
+
+ (2) To make it easier to discard a subset of all the files cached based around
+ a particular key - for instance to mirror the removal of an AFS volume.
+
+However, since it's unlikely that any two netfs's are going to want to define
+their index hierarchies in quite the same way, FS-Cache tries to impose as few
+restraints as possible on how an index is structured and where it is placed in
+the tree. The netfs can even mix indexes and data files at the same level, but
+it's not recommended.
+
+There are some limits on indexes:
+
+ (1) All entries in any given index must be the same size. The netfs supplies a
+ blob of data for each index entry.
+
+ (2) The entries in one index can be of a different size to the entries in
+ another index.
+
+ (3) The entry data must be atomically journallable, so it is limited to 400
+ bytes at present.
+
+ (4) The index data must start with the key. The layout of the key is described
+ in the index definition, and this is used to display the key in some
+ appropriate way.
+
+ (5) The depth of the index tree should be judged with care as the search
+ function is recursive. Too many layers will run the kernel out of stack.
+
+To define an index, a structure of the following type should be filled out:
+
+ struct fscache_index_def
+ {
+ uint8_t name[8];
+ uint16_t data_size;
+ struct {
+ uint8_t type;
+ uint16_t len;
+ } keys[4];
+
+ fscache_match_val_t (*match)(void *target_netfs_data,
+ const void *entry);
+
+ void (*update)(void *source_netfs_data, void *entry);
+ };
+
+This has the following fields:
+
+ (1) The name of the index (NUL terminated unless all 8 chars are used).
+
+ (2) The size of the data blob provided by the netfs.
+
+ (3) A definition of the key(s) at the beginning of the blob. The netfs is
+ permitted to specify up to four keys. The total length must not exceed the
+ data size. It is assumed that the keys will be laid end to end in order,
+ starting at the first byte of the data.
+
+ The type field specifies the way the data should be displayed. It can be
+ one of:
+
+ (*) FSCACHE_INDEX_KEYS_NOTUSED - key field not used
+ (*) FSCACHE_INDEX_KEYS_BIN - display byte-by-byte in hex
+ (*) FSCACHE_INDEX_KEYS_BIN_SZ1 - as above, BE size in byte 0
+ (*) FSCACHE_INDEX_KEYS_BIN_SZ2 - as above, BE size in bytes 0-1
+ (*) FSCACHE_INDEX_KEYS_BIN_SZ4 - as above, BE size in bytes 0-3
+ (*) FSCACHE_INDEX_KEYS_ASCIIZ - NUL-terminated ASCII
+ (*) FSCACHE_INDEX_KEYS_IPV4ADDR - display as IPv4 address
+ (*) FSCACHE_INDEX_KEYS_IPV6ADDR - display as IPv6 address
+
+ (4) A function to compare an in-page-cache index entry blob with the data
+ passed to the cookie acquisition function. This function can also be used
+ to extract data from the blob and copy it into the netfs's structures.
+
+ The values this function can return are:
+
+ (*) FSCACHE_MATCH_FAILED - failed to match
+ (*) FSCACHE_MATCH_SUCCESS - successful match
+ (*) FSCACHE_MATCH_SUCCESS_UPDATE - successful match, entry needs update
+ (*) FSCACHE_MATCH_SUCCESS_DELETE - entry should be deleted
+
+ For example, in linux/fs/afs/vnode.c:
+
+ static fscache_match_val_t
+ afs_vnode_cache_match(void *target, const void *entry)
+ {
+ const struct afs_cache_vnode *cvnode = entry;
+ struct afs_vnode *vnode = target;
+
+ if (vnode->fid.vnode != cvnode->vnode_id)
+ return FSCACHE_MATCH_FAILED;
+
+ if (vnode->fid.unique != cvnode->vnode_unique ||
+ vnode->status.version != cvnode->data_version)
+ return FSCACHE_MATCH_SUCCESS_DELETE;
+
+ return FSCACHE_MATCH_SUCCESS;
+ }
+
+ (5) A function to initialise or update an in-page-cache index entry blob from
+ netfs data passed to FS-Cache by the netfs. This function should not assume
+ that there's any data yet in the in-page-cache.
+
+ Continuing the above example:
+
+ static void afs_vnode_cache_update(void *source, void *entry)
+ {
+ struct afs_cache_vnode *cvnode = entry;
+ struct afs_vnode *vnode = source;
+
+ cvnode->vnode_id = vnode->fid.vnode;
+ cvnode->vnode_unique = vnode->fid.unique;
+ cvnode->data_version = vnode->status.version;
+ }
+
+ Any dead space in the index entry should be filled with a pattern defined
+ by FS-Cache:
+
+ FSCACHE_INDEX_DEADFILL_PATTERN
+
+To finish the above example, the index definition for the "vnode" level is as
+follows:
+
+ struct fscache_index_def afs_vnode_cache_index_def = {
+ .name = "vnode",
+ .data_size = sizeof(struct afs_cache_vnode),
+ .keys[0] = { FSCACHE_INDEX_KEYS_BIN, 4 },
+ .match = afs_vnode_cache_match,
+ .update = afs_vnode_cache_update,
+ };
+
+The first element of struct afs_cache_vnode is the vnode ID.
+
+And for contrast, the cell index definition is:
+
+ struct fscache_index_def afs_cache_cell_index_def = {
+ .name = "cell_ix",
+ .data_size = sizeof(struct afs_cell),
+ .keys[0] = { FSCACHE_INDEX_KEYS_ASCIIZ, 64 },
+ .match = afs_cell_cache_match,
+ .update = afs_cell_cache_update,
+ };
+
+The cell index is the primary index for kAFS.
+
+
+===================================
+NETWORK FILESYSTEM (UN)REGISTRATION
+===================================
+
+The first step is to declare the network filesystem to the cache. This also
+involves specifying the layout of the primary index (for AFS, this would be the
+"cell" level).
+
+The registration function is:
+
+ int fscache_register_netfs(struct fscache_netfs *netfs,
+ struct fscache_index_def *primary_idef);
+
+It just takes pointers to the netfs definition and the primary index
+definition. It returns 0 or an error as appropriate.
+
+For kAFS, registration is done as follows:
+
+ ret = fscache_register_netfs(&afs_cache_netfs,
+ &afs_cache_cell_index_def);
+
+The last step is, of course, unregistration:
+
+ void fscache_unregister_netfs(struct fscache_netfs *netfs);
+
+
+==================
+INDEX REGISTRATION
+==================
+
+The second step is to inform FS-Cache about part of an index hierarchy that can
+be used to locate files. This is done by requesting a cookie for each index in
+the path to the file:
+
+ struct fscache_cookie *
+ fscache_acquire_cookie(struct fscache_cookie *iparent,
+ struct fscache_index_def *idef,
+ void *netfs_data);
+
+This function creates an index entry in the index represented by iparent,
+loading the associated blob by calling iparent's update method with the
+supplied netfs_data.
+
+It also creates a new index inode, formatted according to the definition
+supplied in idef. The new cookie is then returned in *_cookie.
+
+Note that this function never returns an error - all errors are handled
+internally. It may also return FSCACHE_NEGATIVE_COOKIE. It is quite acceptable
+to pass this token back to this function as iparent (or even to the relinquish
+cookie, read page and write page functions - see below).
+
+Note also that no indexes are actually created on disc until a data file needs
+to be created somewhere down the hierarchy. Furthermore, an index may be
+created in several different caches independently at different times. This is
+all handled transparently, and the netfs doesn't see any of it.
+
+For example, with AFS, a cell would be added to the primary index. This index
+entry would have a dependent inode containing a volume location index for the
+volume mappings within this cell:
+
+ cell->cache =
+ fscache_acquire_cookie(afs_cache_netfs.primary_index,
+ &afs_vlocation_cache_index_def,
+ cell);
+
+Then when a volume location was accessed, it would be entered into the cell's
+index and an inode would be allocated that acts as a volume type and hash chain
+combination:
+
+ vlocation->cache =
+ fscache_acquire_cookie(cell->cache,
+ &afs_volume_cache_index_def,
+ vlocation);
+
+And then a particular flavour of volume (R/O for example) could be added to
+that index, creating another index for vnodes (AFS inode equivalents):
+
+ volume->cache =
+ fscache_acquire_cookie(vlocation->cache,
+ &afs_vnode_cache_index_def,
+ volume);
+
+
+======================
+DATA FILE REGISTRATION
+======================
+
+The third step is to request a data file be created in the cache. This is
+almost identical to index cookie acquisition. The only difference is that a
+NULL index definition is passed.
+
+ vnode->cache =
+ fscache_acquire_cookie(volume->cache,
+ NULL,
+ vnode);
+
+
+=====================
+PAGE ALLOC/READ/WRITE
+=====================
+
+And the fourth step is to propose a page be cached. There are two functions
+that are used to do this.
+
+Firstly, the netfs should ask FS-Cache to examine the caches and read the
+contents cached for a particular page of a particular file if present, or else
+allocate space to store the contents if not:
+
+ typedef
+ void (*fscache_rw_complete_t)(void *cookie_data,
+ struct page *page,
+ void *end_io_data,
+ int error);
+
+ int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ unsigned long gfp);
+
+The cookie argument must specify a data file cookie, the page specified will
+have the data loaded into it (and is also used to specify the page number), and
+the gfp argument is used to control how any memory allocations made are satisfied.
+
+If the cookie indicates the inode is not cached:
+
+ (1) The function will return -ENOBUFS.
+
+Else if there's a copy of the page resident on disc:
+
+ (1) The function will submit a request to read the data off the disc directly
+ into the page specified.
+
+ (2) The function will return 0.
+
+ (3) When the read is complete, end_io_func() will be invoked with:
+
+ (*) The netfs data supplied when the cookie was created.
+
+ (*) The page descriptor.
+
+ (*) The data passed to the above function.
+
+ (*) An argument that's 0 on success or negative for an error.
+
+ If an error occurs, it should be assumed that the page contains no usable
+ data.
+
+Otherwise, if there's not a copy available on disc:
+
+ (1) A block may be allocated in the cache and attached to the inode at the
+ appropriate place.
+
+ (2) The validity journal will be marked to indicate this page does not yet
+ contain valid data.
+
+ (3) The function will return -ENODATA.
+
+
+Secondly, if the netfs changes the contents of the page (either due to an
+initial download or if a user performs a write), then the page should be
+written back to the cache:
+
+ int fscache_write_page(struct fscache_cookie *cookie,
+ struct page *page,
+ fscache_rw_complete_t end_io_func,
+ void *end_io_data,
+ unsigned long gfp);
+
+The cookie argument must specify a data file cookie, the page specified should
+contain the data to be written (and is also used to specify the page number),
+and the gfp argument is used to control how any memory allocations made are
+satisfied.
+
+If the cookie indicates the inode is not cached then:
+
+ (1) The function will return -ENOBUFS.
+
+Else if there's a block allocated on disc to hold this page:
+
+ (1) The function will submit a request to write the data to the disc directly
+ from the page specified.
+
+ (2) The function will return 0.
+
+ (3) When the write is complete:
+
+ (a) Any associated validity journal entry will be cleared (the block now
+ contains valid data as far as FS-Cache is concerned).
+
+ (b) end_io_func() will be invoked with:
+
+ (*) The netfs data supplied when the cookie was created.
+
+ (*) The page descriptor.
+
+ (*) The data passed to the above function.
+
+ (*) An argument that's 0 on success or negative for an error.
+
+ If an error happens, it can be assumed that the page has been
+ discarded from the cache.
+
+
+==============
+PAGE UNCACHING
+==============
+
+To uncache a page, this function should be called:
+
+ void fscache_uncache_page(struct fscache_cookie *cookie,
+ struct page *page);
+
+This detaches the page specified from the data file indicated by the cookie and
+unbinds it from the underlying block.
+
+Note that pages can't be explicitly detached from the a data file. The whole
+data file must be retired (see the relinquish cookie function below).
+
+Furthermore, note that this does not cancel the asynchronous read or write
+operation started by the read/alloc and write functions.
+
+
+==========================
+INDEX AND DATA FILE UPDATE
+==========================
+
+To request an update of the index data for an index or data file, the following
+function should be called:
+
+ void fscache_update_cookie(struct fscache_cookie *cookie);
+
+This function will refer back to the netfs_data pointer stored in the cookie by
+the acquisition function to obtain the data to write into each revised index
+entry. The update method in the parent index definition will be called to
+transfer the data.
+
+
+==================================
+INDEX AND DATA FILE UNREGISTRATION
+==================================
+
+To get rid of a cookie, this function should be called.
+
+ void fscache_relinquish_cookie(struct fscache_cookie *cookie,
+ int retire);
+
+If retire is non-zero, then the index or file will be marked for recycling, and
+all copies of it will be removed from all active caches in which it is present.
+
+If retire is zero, then the inode may be available again next the the
+acquisition function is called.
+
+One very important note - relinquish must NOT be called unless all "child"
+indexes, files and pages have been relinquished first.
+
+
+=====================
+PAGE TOKEN MANAGEMENT
+=====================
+
+As previously mentioned, the netfs must keep a token associated with each page
+currently actively backed by the cache. This is used by FS-Cache to go from a
+page to the internal representation of the underlying block and back again. It
+is particularly important for managing the withdrawal of a cache whilst it is
+in active service (eg: it got unmounted).
+
+The token is this:
+
+ struct fscache_page {
+ ...
+ };
+
+Note that all fields are for internal FS-Cache use only.
+
+The token only needs to be allocated when FS-Cache asks for it. This it will do
+by calling the get_page_cookie() method in the netfs definition ops table. Once
+allocated, the same token should be presented every time the method is called
+again for a particular page.
+
+The token should be retained by the netfs, and should be deleted only after the
+page has been uncached.
+
+One way to achieve this is to attach the token to page->private (and set the
+PG_private bit on the page) once allocated. Shortcut routines are provided by
+FS-Cache to do this. Firstly, to retrieve if present and allocate if not:
+
+ struct fscache_page *fscache_page_get_private(struct page *page,
+ unsigned gfp);
+
+Secondly to retrieve if present and BUG if not:
+
+ static inline
+ struct fscache_page *fscache_page_grab_private(struct page *page);
+
+To clean up the tokens, the netfs inode hosting the page should be provided
+with address space operations that circumvent the buffer-head operations for a
+page. For instance:
+
+ struct address_space_operations afs_fs_aops = {
+ ...
+ .sync_page = block_sync_page,
+ .set_page_dirty = __set_page_dirty_nobuffers,
+ .releasepage = afs_file_releasepage,
+ .invalidatepage = afs_file_invalidatepage,
+ };
+
+ static int afs_file_invalidatepage(struct page *page,
+ unsigned long offset)
+ {
+ struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
+ int ret = 1;
+
+ BUG_ON(!PageLocked(page));
+ if (!PagePrivate(page))
+ return 1;
+ fscache_uncache_page(vnode->cache,page);
+ if (offset == 0)
+ return 1;
+ BUG_ON(!PageLocked(page));
+ if (PageWriteback(page))
+ return 0;
+ return page->mapping->a_ops->releasepage(page, 0);
+ }
+
+ static int afs_file_releasepage(struct page *page, int gfp_flags)
+ {
+ struct fscache_page *token;
+ struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
+
+ if (PagePrivate(page)) {
+ fscache_uncache_page(vnode->cache, page);
+ token = (struct fscache_page *) page->private;
+ page->private = 0;
+ ClearPagePrivate(page);
+ if (token)
+ kfree(token);
+ }
+ return 0;
+ }
+
+
+================================
+INDEX AND DATA FILE INVALIDATION
+================================
+
+There is no direct way to invalidate an index subtree or a data file. To do
+this, the caller should relinquish and retire the cookie they have, and then
+acquire a new one.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/