Re: How to manage shared persistent local caching (FS-Cache) with NFS?

From: Chuck Lever
Date: Wed Dec 05 2007 - 14:56:27 EST


On Dec 5, 2007, at 12:11 PM, David Howells wrote:
Okay... I'm getting to the point where I want to release my local caching
patches again and have NFS work with them. This means making NFS mounts share
or not share appropriately - something that's engendered a fair bit of
argument.

So I'd like to solicit advice on how best to deal with this problem.

Let me explain the problem in more detail.


================
CURRENT PRACTICE
================

As the kernel currently stands, coherency is ignored for mounts that have
slightly different combinations of parameters, even if these parameters just
affect the properties of network "connection" used or just mark a superblock
as being read-only.

Consider the case of a file remotely available by NFS. Imagine the client sees
three different views of this file (they could be by three overlapping mounts,
or by three hardlinks or some combination thereof).

This is how NFS currently operates without any superblock sharing:

+---------+
Object on server ---> | |
| inode |
| |
+---------+
/|\
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
/ | \
| | |
| | |
:::::::::::::NFS::::::::|:::::::::::|:::::::::::|:::::::::::::::::::: :::::::::
| | |
| | |
| | |
+---------+ +---------+ | |
| | | | | |
| mount 1 |----->| super 1 | | |
| | | | | |
+---------+ +---------+ | |
| |
| |
+---------+ +---------+ |
| | | | |
| mount 2 |----------------->| super 2 | |
| | | | |
+---------+ +---------+ |
|
|
+---------+ +---------+
| | | |
| mount 3 |----------------------------->| super 3 |
| | | |
+---------+ +---------+

Each view of the file on the client winds up with a separate inode in a
separate superblock and with a separate pagecache. As far as the client kernel
is concerned, they *are* three different files. Any incoherency effects are
ignored by the kernel and if they cause a userspace application a problem,
that's just too bad.

Generally, however, this is not a problem because:

(a) an application is unlikely to be attempting to manipulate multiple views
of a file simultaneously and

(b) cross-view hard links haven't been and aren't used that much.


=============================
POSSIBLE FS-CACHE SCENARIO #1
=============================

However, now we're introducing persistent local caching into the mix. That means we can no longer ignore such remote possibilities - they are possible, therefore we have to deal with them, whether we like it or not.


I don't see how persistent local caching means we can no longer ignore (a) and (b) above. Can you amplify this a bit? Nothing you say in the rest of your proposal convinces me that having multiple caches for the same export is really more than a theoretical issue.

Frankly, the reason why admins mount exports multiple times is precisely because they want different applications to access the files in different ways. Admins *want* one mount point to be available ro, and another rw. They *want* one mount point to use 'noac' and another not to. They *want* multiple sockets, more RPC slots, and unique caches for different applications. No one would go to the trouble of mounting an export again, using different options, unless that's precisely the behavior that they wanted.

This is actually a feature of NFS. It's used as a standard part of production environments, for example, when running Oracle databases on NFS. One mount point is rw and is used by the database engine. Another mount point is ro and is used for back-up utilities, like RMAN.

Another example is local software distribution. One mount point is ro, and is accessed by normal users. Another mount point accesses the same export rw, and is used by administrators who provide updates for the software.

As useful as the feature is, one can also argue that mounting the same export multiple times is infrequent in most normal use cases. Practically speaking, why do we really need to worry about it?

The real problem here is that the NFS protocol itself does not support strong cache coherence. I don't see why the Linux kernel must fix that problem.

The only real problem with the first scenario is that you may have more than one copy of a file in the persistent cache. How often will that be the case? Since the local persistence cache is probably disk- based and thus large relative to memory, what's the problem with using a little extra space?

The problems you ascribe to your second and third caching scenarios (deadlocking and reconnection) are, however, real and substantial. You don't have these issues when caching each mount point separately, right?

It seems to me that implementing the first scenario is (a) straightforward, (b) has fewer runtime risks (ie deadlocks), (c) doesn't take away features that some people still use, and (d) solves more than 80% of the issues here (80/20 rule of thumb).

Lastly, there's already a mount option that allows admins to control whether the page and attribute caches are shared -- "sharecache". Is this mount option not adequate for persistent caching?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/