Re: Memory upgrade: not faster / nfs

Olaf Kirch (okir@monad.swb.de)
Sat, 26 Oct 1996 15:01:41 +0200 (MET DST)


To: linux-kernel@vger.rutgers.edu
Subject: Re: Memory upgrade: not faster / nfs
Newsgroups: swb.lists.linux.kernel
X-Newsreader: TIN [UNIX 1.3 950515BETA PL0]

In article <"sim0s4.fzi.178:25.10.96.16.26.14"@fzi.de> you wrote:
: ! NFS is a filesystem. Filesystems are not cached.
: ! The only thing that's cached in UN*X are block
: ! devices. This is the wrong place to cache.

This is plain wrong, and not only for Linux. Almost all UNIX implementations
cache NFS file data locally.

On Linux, caching of remote file systems happens in the page cache, which
is indexed by inode and offset. When a user process tries to read from
a file on an NFS volume, the VFS calls nfs_file_read, which in turns
calls generic_file_read (mm/filemap.c). If this function is able to locate
the page in the cache, it returns it to the caller; otherwise, it invokes
inode->i_op->readpage, and optionally performs readahead.

Caching writes to a remote file system is an entirely different story.
The current Linux NFS client doesn't implement this, but I'm currently
working on a new NFS implementation which will be able to do this.

The major factor in remote filesystem caching is of course async I/O.
On one hand, it is needed for readahead: when a user process issues a
write request for a portion of a file, generic_file_read will issue some
additional readpage calls which should read more pages of the file
asynchronously, so that the next time around, the requested data
will already be present in the cache. On the other hand, async I/O is
absolutely critical for writing back data that the user process has
written and which has been comitted to the cache.

Currently, async page reads are implemented only for rsize >= PAGE_SIZE
(which is at 4096 on the ix86 platform), and uses four nfsiod processes.
If rsize is smaller than 4096, each readpage operation is broken up
into synchronous read operations of rsize each, which means that for
rsize == 1024 you do four READ RPC calls of 1K to the NFS server.

: My fastest CPU is a Pentium 100 on a diskless client.
: I thought that I get only 5% cpu usage because I am
: swapping over NFS. I bought another 32MB but the CPU
: usage did not increase :-( The Ethernet LED is still
: busy all the time.

The reason why swapping over NFS is so slow has to do with the way it
is implemented. If you're using Claus Heine's NFS swap patch, that's
because his patch turns off all async IO to the NFS server.

: Compiling is a very I/O intensive task (all the
: include files, the compiler, the .I .S .o files).

Indeed. Read caching helps a bit here, but not much. It seems that
compiling is particularly slow because gcc writes out data in very small
chunks, and since writes are not cached, each write operation causes
and RPC call to the server. Try to watch NFS with tcpdump when doing
a compile...

Olaf

-- 
Olaf Kirch        | Sometimes I lie in bed at night, and I ask myself: ``Why?''
okir@monad.swb.de | Then a voice comes to me and says: ``Why, what?''
                                                            -- Charlie Brown