Oops in sock_select (again)

Rob Riggs (rob@pangalactic.org)
Mon, 28 Jun 1999 10:57:37 -0600


This is a multi-part message in MIME format.
--------------0D0F4C8599E9860325955080
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

2.0.3[67] kernels can oops in sock_select() when running a
threaded application.

This bug has be known and reproducible for almost half a year
now. See the following post from January 13:

http://www.tux.org/hypermail/linux-kernel/1999week03/1133.html

This may be the longest time a known, reproducible oops has
remained in the Linux kernel!

There is what appears to be a related problem in 2.2 kernels,
which result in the threads being stuck in skb_recv_datagram.

I'll also include a bit of C code, since no one was interested
in my Python code used to trigger these problems. I've dug into
the kernel code and have gotten nowhere. This has been reproduced
on many different machines, SMP and UP. This oops still occurs in
2.0.37. It can be triggered with threaded calls to gethostbyname_r().

I do find it interesting that the same code that produces oopses
in the 2.0 series results in hung processes under 2.2.

For reference, my original post can be found here:

http://www.tux.org/hypermail/linux-kernel/1999week26/0796.html

One hint is that after some of the 2.0 oopses, the message "VFS:
Close: file count is 0" is found in the syslog.

I am beginning to believe that there is some interaction with
the loading of the NSS shared libraries that is triggering
these problems (at least under 2.2). Here's why:

$ uname -a
Linux diamond.pangalactic.org 2.2.9 #11 SMP
Wed Jun 9 10:27:19 MDT 1999 i686 unknown
$ ./thrtest # Exited successfully -- oddly enough. Very rare.
$ ./thrtest # Had to ^C to exit. Hung in skb_recv_datagram
$ ./thrtest
./thrtest: error in loading shared libraries
/lib/libresolv.so.2: undefined symbol: __res_close
./thrtest
./thrtest: error in loading shared libraries
/lib/libresolv.so.2: undefined symbol: res_querydomain

Of course there is absolutely nothing wrong with libresolv.so,
and similar errors will occur for all of the libnss_xxx.so
libraries. And, in fact, the program continues to run.

Re-arranging the resolver order in nsswitch.conf doesn't
appear to have any impact on the problem, unless the host
to be resolved is in /etc/hosts and 'hosts' appears before
'dns' in the /etc/nsswitch.conf 'hosts:' line.

-- 
Rob Riggs
Technical Staff
Tummy.com, Ltd.
http://www.tummy.com/
--------------0D0F4C8599E9860325955080
Content-Type: application/x-gzip;
 name="thrtest.c.gz"
Content-Transfer-Encoding: base64
Content-Disposition: inline;
 filename="thrtest.c.gz"

H4sICJefdzcAA3RocnRlc3QuYwCtV21v2zYQ/iz9ipuLGZIju3a7DgvcBPDaNB3gNkOaoisy w5Al2uKit4mUG3fwf98dRcqS7aQdsKJNyePxXp579Eh52rOhBzIqJBNyEOCG9lOelvdwx4qU xSCy4G4uWMwC6biwKJh/1+cSgixk5PsqyzcFX0USnMCF0enpKdyUSbIZBFniwVSGA5jEMSgX AQUTrFizcEBXPxVcSpbCYgPX2QKu+WoloA/XmMsXLIQyDVmBxTG4TEu4/H2qy/sU4aWSPPIi y1kRbzz04oK2q8JPIPE3EPjoAWIjJEsEyAwWLPLXqmRWFL7kgR/HmwF8zkqgAzzHqF987Lmq jv4l/h2D6es308nlh7N+nCNQzA8NXujx1Laf8DSIy5DByzLlAvuNzhs2NPCsbUqZDBd7XhsR Z6u2TScjIyYJ2ZKnDN5efbh5P3l3Mb14D89e/FzbLy9u6OjXj2/o5JfR6bP6aHp1Oacr0NFl d6jmJZ7Cu8906SA4VJn751EmZOonzH7CYsEO/TrSTLqDLmnIl7a9zngI2IwTRH4BvUSsXPsf 28I5pWQ11Xgw9FRp06tXk+lzd2xbFQiVx9XNb6/Qh27jSRBnSEA8o03BZFmkY3tr23KTM+pD yKIMJGAaDdo8KSW7n0vLipG9450dLVa1RCNPaWtZfszXjNJgxbQ3bd820J5hQo3LPPSlP5dj 28YAxsZTLp3WOfR61V71r3MJ6ctSYC7b0qdwBvv3XORdjIU7gn9l2bJ97BIGfAlOff8M3n+c Tl3b0tBAf4QuCdKeSeOl4H4wXBs11UrXMVfd/jmB6HpVGnRvnCjosIUhzU/1hpsVkwbCZhRj 85osNt3oyy9h2GgEH9Q0QwfFvtpqMNy2BhCiakj2wAg8IE+OenJfLRP/Xo2F2Epz6dWDMVj8 lfFUR7tVF2f9cxOsW3lT7cuCMaenHU+Uo4IU4Q/yjbbDicnd3uP/Iw8cLAZlz9Gmkeui8hyb FTLDVW2rZ8zMzK/rrN0avNOPBmHPkIKov2UsPehF+Y7wvUW5HO9R1KvBL1PBVyiKoLx57odh cfvTzKOHHC/eogzNxnUiyRMmchbQtP72UO8TRXZ0JKJXmlDzu6VahgnKdZ/TR+i12BCZ5oVj 2IRzMe1hDK+tiXgY5fhDdWVSRfnRTHp2VaseRDlyd95cxyxdyUhpVl4gbEunwsKDDgd8//wY DszfDvKtgmw4q5ej3fLZbvl8RhFJ5qpoe2Knhq5HOy/K1KkMNHM16RYHjLoo8B8RGlqNjcfu cR4diEKZkgg4XeOpJKG+2WThoaA8drMtId+XU+sAzW1s7ckAsqQ4EOJHdfh7aq25Z5wD/Ilq 03WMp05RiaR5zmlMZt0ofE/AuJgQBN8qeu3H+Mo5Wq7TqpcyoTNW24L4QXgPb+s6Vcbt7uPj 3eSPm7fXF5PXH+DFsKpdzZ0WfrEKPKieb1yvb2dHSGlYKW53oWaNF3EQcxQpwnnoaYWsll6N 2Td0ZpkVKKSKTcDxbTIaDml1cuLi1wE+40/xg7XEz1Hsh4uIGRUV9CVnWXVKjGV9iXjMjCq/ BF0bxcFKST3acxP6LeGSlOhQJycUSL2+qms1Hs2XlfDM22GHiqsuGkD6fbXdmsy6KNSuuqpu F9VVw3d+1oykK243p2AbyDV+2AcHtrQyEnbqT3WY+mkmYsZyp6sgx5+JW9e1JXSptub3UNdA oyubufADJtMVaensoCTj1OgCR6J+5enKTOXPtFNlUL91jO0qlR7MSIepn80jCtDIPDbgGf+z 6lOjEfwYJo9B8iAiVKVOrCiwtVvo6wbqcRlEFKmU3/nZPt9aw/s/6/wWkxtE3vH4v9G4xeKt rbiiJWZIAvMvhpTVgYUOAAA= --------------0D0F4C8599E9860325955080--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/