Re: NFS client problems in 2.4.18 to 2.4.20

From: Jamie Lokier
Date: Sun Sep 07 2003 - 09:28:18 EST


Trond Myklebust wrote:
> > This might be the bug where it adjusts the retransmit timeout
> > to a ridiculously small sub-millisecond value, because of a
> > sequence of fast cached responses from the server
>
> BTW: this should be fixed now in 2.6.x. I've set a minimum value on
> the estimated error on the round-trip time to 1/10sec.

I don't think the round-trip estimate was ever a real problem,
although setting a low bound does make sense.

A real problem is the rule of having a fixed number of retransmits
before an operation fails with a "soft" moount. This is wrong for
NFS, now that rtt is estimated dynamically.

It is wrong because NFS response times are not due to network
congestion - they are due mainly to I/O on the server, and I/O times
don't have the same properties as networks at all.

The "soft operation fail" imeout should have a minimum absolute time,
like 30 seconds or so. It should also have a maximum (for systems
where the estimated rtt is 10 seconds). This should be independent of
the rtt estimate.

Think of a worst case:

- server responds to cached requests within 10 microseconds

- uncached requests take 10 seconds to respond (spinning up CD,
seeking on tape HFS, or just ordinary disk/swap contention).

This should never timeout, as long as the server is responding within
a fixed absolute time, although it's fine to issue lots of retransmits
until that time.

The fundamental error is assuming that all NFS requests take about the
same time to server, and delays are caused by the network. This isn't
true especially on a LAN. Delays for NFS are typically caused by I/O,
and vary by 6 orders of magnitude from request to request.

-- Jamie
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/