Re: 2.4.8 NFS Problems

From: Steffen Persvold (sp@scali.no)
Date: Thu Dec 20 2001 - 04:44:45 EST


Hi guys,

I was searching on google for some reports on the problem I'm seeing with our NFS server/clients and
found this thread. It looked somewhat the same (atleast the result with the EIO is the same).

Parts of old message :

>From: Mike Black (mblack@csihq.com)
>Date: Sep 05 2001

>I've been getting random NFS EIO errors for a few months but
>now it's repeatable.
>Trying to copy a large file from one 2.4.8 SMP box to another
>is consistently failing (at different offsets >each time).

Our setup is like this :

Server:
        RedHat 7.2 - kernel 2.4.9-13smp
        nfs-utils-0.3.1-13.7.2.1
        ext3 filesystem (73GB)

Clients:
        ia32 client - RedHat 6.2 - kernel 2.2.19-6.2.7enterprise
        mount-2.10r-0.6.x

        alpha client - RedHat 6.2 - kernel 2.2.19 (vanilla)
        mount-2.10r-5

        ia64 client - RedHat 7.1 - kernel 2.4.3-12smp
        mount-2.10r-5

I've seen the "Input/Output error" problem only on the Alpha and the IA64 clients and the problem is
occuring when making a static library (with 'ar'). The message is like this :

ar: xxxxxx/libmpi.a: Input/output error

The mountpoints is mounted like this :

ia32 client:
huey:/export/home/mpitest /home/mpitest nfs rw,v3,rsize=8192,wsize=8192,addr=huey 0 0

alpha client:
huey:/export/home/mpitest /home/mpitest nfs rw,v3,rsize=8192,wsize=8192,addr=huey 0 0

ia64 client:
huey:/export/home/mpitest /home/mpitest nfs rw,v3,rsize=8192,wsize=8192,hard,udp,lock,addr=huey 0 0

I don't know why the "hard" and "lock" options doesn't appear on ia32 and alpha, but this might be
related to the /proc/mounts interface on the running kernel (these clients are running 2.2.19 while
the ia64 client is running 2.4). The automount entry looks like this :

/home auto_home rsize=8192,wsize=8192

So according to the nfs man pages the "hard" option should be default :

       hard If an NFS file operation has a major timeout then report "server not
                      responding" on the console and continue retrying indefinitely. This
                      is the default.

So what could be the problem here ? Is it a NFS server bug, a NFS client bug or a NFS/ext3 bug ? We
used to run RedHat 7.0 on this server with the 2.2.19-enterprise kernel, nfs-utils-0.3.1-7 and with
a ext2 filesystem. This problem did not occur back then.

Thanks,

-- 
  Steffen Persvold   | Scalable Linux Systems |   Try out the world's best   
 mailto:sp@scali.no  |  http://www.scali.com  | performing MPI implementation:
Tel: (+47) 2262 8950 |   Olaf Helsets vei 6   |      - ScaMPI 1.12.2 -         
Fax: (+47) 2262 8951 |   N0621 Oslo, NORWAY   | >300MBytes/s and <4uS latency
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Dec 23 2001 - 21:00:21 EST