NFS problem on Sparc64 with 2.2.15 and upper (NOT 2.3.x and 2.4.x)

From: Francis Galiegue (fg@mandrakesoft.com)
Date: Fri Jun 16 2000 - 09:44:45 EST


Machine is an Enterprise 250, with its genuine SCSI boards (sym53c8xx as a
driver), but with only ONE CPU out of 2 - CPU is UltraSPARC running at 300 Mhz.
Disks are SCSI.

The problem occurs with stock 2.2.15 and 2.2.16, the latest 2.2.16 RPM from RH
and 2.2.15 and 16 RPM from us (MandrakeSoft). NFS will just hang after a
certain amount of fhs have been dealt with. As SysRq was still functional, I
could get the reg dump: ret_pc was into add_to_fhcache.

As 2.2.14 and below didnt have this problem, I tried and undid one by one all
changes done to NFS between 14 and 15. ONE change, when undone, cures the
problem, and it's this particular hunk in fs/nfsd/nfsfh.c:

@@ -676,6 +690,15 @@
                 return 1;
         }
 
+ /* if nfsd_server is zero, NFSD_MAXFH will be zero too, so
+ * find_fhe() will NEVER find the file handle NOR an empty space,
+ * and expire_slot will not be able to expire any file handle,
+ * because NFSD_MAXFH is zero ... */
+
+ if (nfsd_nservers <= 0) {
+ return 0;
+ }
+
         expire_slot(cache);
         goto repeat;
 }
         

To say the least, it looks very surprising to me. I don't see WHY reverting
this makes NFS work again on this particular config.

Note that this occurs whether the kernel is SMP or UP. OTOH, an Ultra5 with
400MHz UltraSPARC and an IDE disk does not have this problem. So, this sounds
like a deadlock. But finding whether this is a deadlock, and if it is, why it
occurs is way beyond my knowledge.

On all the x86 archs which I could test on, NFS had no problems. The only
"difference" I see between the Enterprise and others is that it has very fast
disks and a quite slow CPU. I haven't tried yet (because I cannot build) with a
sparc32 or x86 config with a slow CPU and very fast disks.

Any clue?

-- 
fg

"You can tune a filesystem but you can't tuna fish" (HP/UX' tunefs manpage)

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Jun 23 2000 - 21:00:11 EST