Re: NFS still has caching problems

really kuznet@ms2.inr.ac.ru (inr-linux-kernel@ms2.inr.ac.ru)
19 Jul 1996 19:01:33 +0400


Olaf Kirch (ok@daveg.COM) wrote:

: First, let's assume you invalidate the cache unconditionally as soon as
: the server's mtime changes. So who's going to be bitten by this?
: Firstly, applications that access files in a read-write-read-write
: pattern. Another problem occurs with the attributes returned by a read
: NFS call; if we invalidate the cache in this case, we will not only
: throw away the old cached pages, but also those just read, which *are*
: uptodate. Now multiply that by 4 because of the way we currently do
: readahead, and the result is not pretty.

If it is you who read-write-read-write, my patch solves this problem.
If you are continuously reading, and someone another is continuously
writing - the behaviour that you treaten is the only correct even if
it seems not effective! Else you will read complete garbage.

: Now, assume we try to be clever and cheat when receiving the server's
: attributes from an operation that we know will change the server's
: mtime. Linus already mentioned the race window that exists here: you
: cannot assume that your cache is still valid just because _your_
: operation changed only the file's meta data; an intervening operation
: from another client could have modified the file contents. This is not
: splitting hair; remember the whole problem is about someone else writing
: to the file while we're accessing it. Besides, we're not talking about
: revalidating data for the duration of just acregmax; once we've updated
: NFS_OLDMTIME, future calls to revalidate_inode will not throw away _any_
: cached page until the file is changed again. NFSv3 tries to eliminate
: this problem by providing the pre- and post-op attributes in the NFS
: reply.

NFSv3 DOES NOT solve this race condition, and it is pointed out
in Sun specs. wcc data is absolutely equivalent to doing
getattr, and then required operation, only twice faster.
That's why they are called "weak".
So that, we are deemed to co-exist with this problem.

In any case, the solution that I proposed is the most graceful
in this respect, we will be synchronized again at most for acregmax
(I do not touch NFS_OLDMTIME!), that at least is not worse than current
behaviour.

: The best solution I can see is to change the attribute caching (which
.......
: These are just suggestions; comments welcome. I will also look into the
: BSD code to see how they do it.

It would be better to look at Sun code :-) it works ideally in this respect.

Well, you can choose any solution: the only requirement that
file changes were discovered for ~acregmin.

At first look, my solution is more clean and reliable. I'll think about it.

: PS: Side note to Alex: Linux does not always track the server's mtime
: in inode->i_mtime; utimes() will set it to the client's time. This is
: actually a flaw in NFSv2 (in NFSv3 you can set the inode's time fields
: to server time).

Both correct and incorrect. Yes, it is NFSv2 drawback, but
it has nothing to do with caching problem.
Yes, you send client times to server, but when server sets them,
this times are shared by all clients.
So that, "make" will not work, but cache will work superbly.
Note, that I never compare mtimes for greater/less, only
for exact coincidence.

Alexey Kuznetsov.