2.2.14 kernel -- problems with NFS (and tcp?)

From: Chris McClellen (cmcclellen@weather.com)
Date: Wed Feb 09 2000 - 13:59:51 EST


Couple of different problems to put forth here. First is NFS.

Details of machine:
RH6.1, 2.2.14 kernel, SMP -- 2xp3 500's, 256mb, nfsd as module,
autofs as module. Megaraid and adaptec scsi. eepro100.

1st problem:

As NFS server, if an IRIX 6.3 client tries to mount a directory hosted by the linux machine,
it sends abunch of NFS3 requests at first, and the linux box Oopses a couple of times
and panics. The panic is caused by something trying to kill the init task. The workaround
was to mark directories served up by the linux machine as nfsv2. Now, the IRIX box doesnt
send any v3 requests -- no panics or oopses now.

I have not attached an oops report because I didnt copy it down -- and it didnt manage to
make it to disk either. So I would have to recreate the panic to do it. I can recreate it simply
by unmounting, changing the IRIX so it does nfsv3 again. If it comes to that, I'll do it and
try to faithfully copy everything down exactly (knock on wood).

The linux box is part of an autofs set of computers. Some homes are on this linux box. Obviously,
we can move the homes. We'd prefer not to.

Problem #2)

Complete and utter hang with no warnings or text. When its hung -- its hung. If the vga console
is "blanked", there is no comming back -- so in most circumstances I can't see if anything
was written to the console -- there is definitly nothing in syslog. But even when it hangs I see
no output indicating there was a problem.

I had read about TCP hangs with smp earlier, but didnt pay much attention, so this
may be a real (non-hardware) problem. Alot of tcp traffic is happening on this box.
We get a complete hang once a week or so to where only a hard reboot will fix it.

This problem worries me as linux is going to be used on-air in a broadcast environment
(guess where). It will definitly be smp with much tcp traffic. I don't want the thing
hanging every week :-). So, can someone enlighten me as to whether the "tcp hang"
that i've seen in some of the 2.2.15pre patch notes (thank you AC!) can bring a
SMP box to a complete halt (someone forget to release a spinlock?).

Final notes:

I am free to go back to an older kernel (ie, 2.2.13). 2.2.14 was used to get beyond some autofs
problems that we had with an earlier version of the kernel (long story). So, if someone knows
that these 2 problems are indeed real problems and that older kernels don't have them,
I'll move back. But megaraid would still need to work :-).

Or, I can wait for 2.2.15 (or beyond) if thats the right answer. But for now, the current
setup can't be used on air because of #2 (unless it turns out to be H/W). #1 isn't an
issue -- no stinkin' nfs serving from the production machine.

Thanks in advance.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Feb 15 2000 - 21:00:15 EST