Here's a roadmap for the future development of netfslib and local caching
(e.g. cachefiles).
Local Caching
=============
There are a number of things I want to look at with local caching:
[>] Although cachefiles has switched from using bmap to using SEEK_HOLE and
SEEK_DATA, this isn't sufficient as we cannot rely on the backing filesystem
optimising things and introducing both false positives and false negatives.
Cachefiles needs to track the presence/absence of data for itself.
I had a partially-implemented solution that stores a block bitmap in an xattr,
but that only worked up to files of 1G in size (with bits representing 256K
blocks in a 512-byte bitmap).
[>] An alternative cache format might prove more fruitful. Various AFS
implementations use a 'tagged cache' format with an index file and a bunch of
small files each of which contains a single block (typically 256K in OpenAFS).
This would offer some advantages over the current approach:
- it can handle entry reuse within the index
- doesn't require an external culling process
- doesn't need to truncate/reallocate when invalidating
There are some downsides, including:
- each block is in a separate file
- metadata coherency is more tricky - a powercut may require a cache wipe
- the index key is highly variable in size if used for multiple filesystems
But OpenAFS has been using this for something like 30 years, so it's probably
worth a try.
[>] Need to work out some way to store xattrs, directory entries and inode
metadata efficiently.
[>] Using NVRAM as the cache rather than spinning rust.
[>] Support for disconnected operation to pin desirable data and keep
track of changes.
[>] A user API by which the cache for specific files or volumes can be
flushed.
Disconnected Operation
======================
I'm working towards providing support for disconnected operation, so that,
provided you've got your working set pinned in the cache, you can continue to
work on your network-provided files when the network goes away and resync the
changes later.
This is going to require a number of things:
(1) A user API by which files can be preloaded into the cache and pinned.
(2) The ability to track changes in the cache.
(3) A way to synchronise changes on reconnection.
(4) A way to communicate to the user when there's a conflict with a third
party change on reconnect. This might involve communicating via systemd
to the desktop environment to ask the user to indicate how they'd like
conflicts recolved.
(5) A way to prompt the user to re-enter their authentication/crypto keys.
(6) A way to ask the user how to handle a process that wants to access data
we don't have (error/wait) - and how to handle the DE getting stuck in
this fashion.
David