read(2) hangs waiting for data from a closed socket

From: Alex Riesen
Date: Sun Jun 13 2010 - 11:07:52 EST


Hi,

I noticed that conky (http://conky.sourceforge.net/) freezes while reading
from hddtemp socket (it's an IPv4 TCP socket, loopback).
I didn't notice when exactly this started happening (I started running
full desktop on 2.6.35 only since -rc3) and haven't bisected yet (I'll try,
time and our daughter permitting).

Little what I have (2.6.35 crashes on me yet):

- hddtemp has closed this socket already (lsof -p `pidof hddtemp`
shows only its listening socket, no established connections). I can even
kill hddtemp altogether, I does not matter for conky's read

- conky waits in read(2) with this trace:
conky S ffff880033550348 0 19324 1 0x00000001
ffff880074a85b08 0000000000000082 0000000000004000 ffff880074a85fd8
0000000000013640 ffff880074a85fd8 ffff880033550000 0000000000013640
0000000000013640 0000000000013640 0000000000013640 ffff880033550000
Call Trace:
[<ffffffff81465d81>] schedule_timeout+0x28/0x243
[<ffffffff81028a12>] ? scale_rt_power+0x23/0x69
[<ffffffff81034683>] ? load_balance+0x40a/0xf99
[<ffffffff814676cb>] ? _raw_spin_unlock_bh+0xf/0x11
[<ffffffff8136463e>] sk_wait_data+0x87/0xd1
[<ffffffff81050e32>] ? autoremove_wake_function+0x0/0x38
[<ffffffff813a793a>] tcp_recvmsg+0x431/0x88f
[<ffffffff813c2b0b>] inet_recvmsg+0x6b/0x80
[<ffffffff81360fe9>] sock_aio_read+0x118/0x12c
[<ffffffff810cc722>] do_sync_read+0xc7/0x10d
[<ffffffff8104930e>] ? ptrace_notify+0x95/0xb1
[<ffffffff81161aea>] ? security_file_permission+0x11/0x13
[<ffffffff810cd1a7>] vfs_read+0xbe/0x147
[<ffffffff810cd2f4>] sys_read+0x47/0x6f
[<ffffffff81002ba9>] tracesys+0xd9/0xde

(I have modified conky to use read, original code used recv,
doesn't matter in this case, though).

- It doesn't happen every time conky connects to hddtemp,
but surely happens if leave it running for five minutes. I could
reproduce the problem by running a simple program which
just connects to hddtemp, polls(2), reads and exits, in a tight
loop, so it is not something specific to conky. An while the conky's
code is not pretty it is certainly nothing unusual: basically
connect, select for incoming, blocking read if select returns
a ready socket, and close (see below).

- the system is a Dell M1330 laptop (Core 2 Duo, 64bit kernel
and userspace). Config attached.

The conky's code (conky_1.6.1-0ubuntu4):

char *get_hddtemp_info(char *dev, char *hostaddr, int port, char *unit)
{
int sockfd = 0;
struct hostent he, *he_res = 0;
int he_errno;
char hostbuff[2048];
struct sockaddr_in addr;
struct timeval tv;
fd_set rfds;
int len, i, devlen = strlen(dev);
char sep;
char *p, *out, *r = NULL;

if ((sockfd = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
perror("socket");
return NULL;
}

do {
#ifdef HAVE_GETHOSTBYNAME_R
if (gethostbyname_r(hostaddr, &he, hostbuff, sizeof(hostbuff),
&he_res, &he_errno)) { // get the host info
ERR("hddtemp gethostbyname_r: %s", hstrerror(h_errno));
break;
}
#else /* HAVE_GETHOSTBYNAME_R */
he_res = gethostbyname(hostaddr);
if (!he_res) {
perror("gethostbyname");
break;
}
#endif /* HAVE_GETHOSTBYNAME_R */

addr.sin_family = AF_INET;
addr.sin_port = htons(port);
addr.sin_addr = *((struct in_addr *) he_res->h_addr);
memset(&(addr.sin_zero), 0, 8);

if (connect(sockfd, (struct sockaddr *) &addr,
sizeof(struct sockaddr)) == -1) {
perror("connect");
break;
}

FD_ZERO(&rfds);
FD_SET(sockfd, &rfds);

/* We're going to wait up to a quarter a second to see whether there's
* any data available. Polling with timeout set to 0 doesn't seem to work
* with hddtemp. */
tv.tv_sec = 0;
tv.tv_usec = 250000;

i = select(sockfd + 1, &rfds, NULL, NULL, &tv);
if (i == -1) {
if (errno == EINTR) { /* silently ignore interrupted system call */
break;
} else {
perror("select");
}
}

/* No data available */
if (i <= 0) {
break;
}

p = buf;
len = 0;
do {
i = read(sockfd, p, BUFLEN - (p - buf));
if (i < 0) {
perror("read");
break;
}
len += i;
p += i;
} while (i > 0 && p < buf + BUFLEN - 1);

if (len < 2) {
break;
}

buf[len] = 0;

/* The first character read is the separator. */
sep = buf[0];
p = buf + 1;

while (*p) {
if (!strncmp(p, dev, devlen)) {
p += devlen + 1;
p = strchr(p, sep);
if (!p) {
break;
}
p++;
out = p;
p = strchr(p, sep);
if (!p) {
break;
}
*p = '\0';
p++;
*unit = *p;
if (!strncmp(out, "NA", 2)) {
strncpy(buf, "N/A", BUFLEN);
r = buf;
} else {
r = out;
}
break;
} else {
for (i = 0; i < 5; i++) {
p = strchr(p, sep);
if (!p) {
break;
}
p++;
}
if (!p && i < 5) {
break;
}
}
}
} while (0);
close(sockfd);
return r;
}

Attachment: 2.6.35-rc3.config
Description: Binary data