Re: 2.1.128 Lock

Simon Kirby (sim@netnation.com)
Sun, 15 Nov 1998 11:58:04 -0800 (PST)


On Sun, 15 Nov 1998, Simon Kirby wrote:

> It took a while, but I was able to duplicate it at least 3 times with
> arca-19 applied.
>
> I have not tried compiling with SMP yet.

I just wrote another program in an attempt to duplicate the problem more
quickly, as I know the mail server I tried 2.1.127 on was able to
duplicate it naturally after about 10 minutes of uptime.

So, I wrote the following code:

#include <sys/types.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc,char *argv[]){
int newrand;
struct timeval tv;

for (;;){
newrand = rand() % 100000;
if (fork()){
fprintf(stderr,".");
wait(NULL);
} else {
tv.tv_sec = 0;
tv.tv_usec = newrand;
select(0,NULL,NULL,NULL,&tv);
exit(0);
}
}
}

Good news and bad news. Good news...I spawned 24 in the background and
after about 5 seconds the problem showed up. I had to wait a long time
for my "killall" to go through, but I eventually got it and killed them
all. Bad news...I haven't yet been able to duplicate it again, no matter
how many I spawn. :(

I tried making the timeout value negative randomly just to see what would
happen, and something rather interesting started happening. Try the
following program:

#include <sys/types.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc,char *argv[]){
int newrand;
struct timeval tv;

for (;;){
newrand = rand() % 100000;
if (rand() % 20 < 10)
newrand = -newrand;
if (fork()){
fprintf(stderr,".");
wait(NULL);
} else {
tv.tv_sec = 0;
tv.tv_usec = newrand;
select(0,NULL,NULL,NULL,&tv);
exit(0);
}
}
}

And spawn just one child. I got a whole wad of messages similar to:

schedule_timeout: wrong timeout value fffffffb from c012b4d0
schedule_timeout: wrong timeout value fffffff9 from c012b4d0
schedule_timeout: wrong timeout value ffffffff from c012b4d0
schedule_timeout: wrong timeout value fffffffc from c012b4d0
schedule_timeout: wrong timeout value fffffffd from c012b4d0
schedule_timeout: wrong timeout value fffffffb from c012b4d0
schedule_timeout: wrong timeout value fffffff8 from c012b4d0
schedule_timeout: wrong timeout value fffffff9 from c012b4d0
schedule_timeout: wrong timeout value fffffff9 from c012b4d0
schedule_timeout: wrong timeout value fffffff8 from c012b4d0
schedule_timeout: wrong timeout value fffffffd from c012b4d0
schedule_timeout: wrong timeout value fffffffd from c012b4d0
schedule_timeout: wrong timeout value fffffffd from c012b4d0

...those timeout values are suspicious.

Simon-

| Simon Kirby | Systems Administration |
| mailto:sim@netnation.com | NetNation Communications |
| http://www.netnation.com/ | Tech: (604) 684-6892 |

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/