Re: [RFC] symlink caching and network filesystems

David S. Miller (davem@redhat.com)
Mon, 7 Jun 1999 08:54:43 -0700


Date: Mon, 7 Jun 1999 02:33:35 -0400 (EDT)
From: Alexander Viro <viro@math.psu.edu>

On Mon, 7 Jun 1999, Erez Zadok wrote:

> I'm not sure this is relevant, but please ensure that there's a
> way for mount() to turn off such symlink caching. It's useful
> for my user-level hlfsd to turn off symlink caching, esp. over
> nfs. (Hlfsd creates a symlink that points to different places
> depending on the euid accessing the symlink; it's used to
> redirect /var/mail to users' home dirs.)

Wait a minute. Are you talking about the userland server + normal
NFS client combination? <scratching head> Hrrrmm... But READLINK in
NFSv2 (and IIRC in v3 too) doesn't pass credentials, right? Could
you elaborate on that?

This is a non-issue. Alexander, if I understand your proposal
correctly, the hlfsd issue is solved by using an attribute timeout of
zero. NFS must revalidate the inode on every access, so the symlink
cache validity is a by-product of inode revalidation. See?

I'm doing some performance work on client side NFS right now, things
look extremely promising so far, and I'll post my results (in the form
of a patch against 2.3.x) when I have my first pass working. In
essence:

1) All read operations into the client side NFS invoke _one_ copy and
for the UDP case at least, the networking checksum is done in
parallel with copies into the page cache for reads for instance.

This is implemented using two things. First the UDP checksum
validation is deferred to happen outside of udp_rcv() processing,
and in NFS's case it thus happens in the data_ready callback.
Secondly, we defer linearization of fragmentated packets on
receive, and the NFS data_ready code knows how to handle the
fragment chains. This is where we were getting killed on bulk read
performance.

2) NFS dirent information lives in the page cache, no funny fixed size
cache with funny invalidation semantics, truncate_inode_pages does
it all, and thus NFS inode revalidation hands the grot of keeping
dirent information from going stale as it does for all other
attributes already.

As a nice bonus this means the direct single copy overhead obtained
in #1 happens for dirent information too.

What you see on the wire for #2 is just an inode getattr for a dirent,
in the worst case (the attribute timeout was hit). Right now if you
obtain a working set of more than 64 directories, the network gets
flooded with RPC requests because of our fixed NFS dirent cache size
(actually it's less than 64 because those cache entries can each hold
less than the full contents of a directory if the dirent information
exceeds rsize).

Later,
David S. Miller
davem@redhat.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/