Re: processes freezing between v2.1.130 and v2.2.1

Truxton Fulton (trux@truxton.com)
Sun, 14 Feb 1999 14:07:52 -0800 (PST)


I have resolved my problem, and it is not a kernel bug. I recall at least
one other post to linux-kernel with the same problem I had : processes
block for a long time in select(), for hundreds or thousands of seconds.
Running gdb on xclock revealed that XtWaitForSomething() was doing the
malformed select(). I upgraded to egcs 1.1.1 and I fetched the XFree86
3.3.3.1 patch and recompiled /usr/X11R6. The new libraries fix the problem.
The reason the bug appeared after 2.2.0 may have been a version numbering
glitch in the X libs. The compiler upgrade probably has nothing to do
with the fix, but I didnt take the time to rule that possibility out.
Why this happened only on my low memory 486 w/ nfs mounted file systems,
and not my 96MB pentium with local FS, I dont know. The problem seemed
to be somehow correlated with using swap... In any case, xload and
xclock work and Netscape is back to its (mostly) reliable self now.

Happily running 2.2.2,

-Truxton

I had previously written :

> There is a kernel peculiarity introduced after v2.1.130. I have been seeing
> it happen with all the 2.2.0-pre releases, but it is intermittent enough
> to make it hard to reproduce. What happens is that a process will block
> for a long time. This happens with xload, xclock, netscape, and other
> apps too. I have two machines with identical filesystems. This problem
> does not happen on my 96MB pentium with local disks, rather it happens
> on my 24MB i486 with all filesystems nfs mounted (although it has a local
> swap disk). High load average and swapping seem to trigger this bug.
> A process will just get stuck for a long while. Using strace on xload
> and xclock reveals a select() waiting on fd 3 for hundreds or sometimes
> thousands of seconds. I remember someone else on the mailing list
> puzzling at netscape doing a similar select(). I can believe netscape
> has enough bugs of its own to cause this bad behaviour, but what I find
> strange is that other relatively simple programs like xload are doing the
> same thing, and that reverting to linux v2.1.130 seems to cure the problem.
> v2.2.1 contained some promising sounding comments in the mm code about
> fixing a scheduling problem, but v2.2.1 alas still has the same problem.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/