pre7 still has broken timeouts in select() and poll()

Jamie Lokier (lkd@tantalophile.demon.co.uk)
Fri, 15 Jan 1999 23:48:25 +0000


Something is definitely up with the recent timeout changes.

Netscape keeps hanging. strace reveals one of two behaviours when it's
hung:

1. oldselect() called, with huge timeout (like a day long), and
no file descriptors.

Break and restart (using strace); the timeout reduces by a small
interval as expected.

2. Repeatedly SIGALRM is sent to the process, between repeated
oldselect() calls that are interrupted. The signals come at high
speed, and there is no syscall starting a new alarm. So the interval
timer probably got screwed somewhere.

While we're here, what's this think it's doing? Looks to me like
`(timeout*HZ+999)/1000+1' will overflow if `timeout ==
MAX_SCHEDULE_TIMEOUT / HZ - 1' and HZ < 999.

It won't overflow if you convert to unsigned before the divide though.

+ if (timeout) {
+ /* Carefula about overflow in the intermediate values */
+ if ((unsigned long) timeout < MAX_SCHEDULE_TIMEOUT / HZ)
+ timeout = (timeout*HZ+999)/1000+1;
+ else /* Negative or overflow */
+ timeout = MAX_SCHEDULE_TIMEOUT;
+ }

I notice in the following formula in sys_select():

timeout = ROUND_UP(usec, 1000000/HZ).
where ROUND_UP(a,b) = (a+b-1)/b

Of course, usec < 0 is tested earlier. But usec > 2^31-1 is not tested.
(2^31-1 + 1000000/HZff - 1) / (1000000/HZ) is negative. -> negative
timeout.

This has been tested, it generates a kernel message about negative
timeout in schedule_timeout.

Still not right :-)

-- Jamie

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/