> It took a while, but I was able to duplicate it at least 3 times with
> arca-19 applied.
>
> I have not tried compiling with SMP yet.
I just wrote another program in an attempt to duplicate the problem more
quickly, as I know the mail server I tried 2.1.127 on was able to
duplicate it naturally after about 10 minutes of uptime.
So, I wrote the following code:
#include <sys/types.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc,char *argv[]){
int newrand;
struct timeval tv;
for (;;){
newrand = rand() % 100000;
if (fork()){
fprintf(stderr,".");
wait(NULL);
} else {
tv.tv_sec = 0;
tv.tv_usec = newrand;
select(0,NULL,NULL,NULL,&tv);
exit(0);
}
}
}
Good news and bad news. Good news...I spawned 24 in the background and
after about 5 seconds the problem showed up. I had to wait a long time
for my "killall" to go through, but I eventually got it and killed them
all. Bad news...I haven't yet been able to duplicate it again, no matter
how many I spawn. :(
I tried making the timeout value negative randomly just to see what would
happen, and something rather interesting started happening. Try the
following program:
#include <sys/types.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
int main(int argc,char *argv[]){
int newrand;
struct timeval tv;
for (;;){
newrand = rand() % 100000;
if (rand() % 20 < 10)
newrand = -newrand;
if (fork()){
fprintf(stderr,".");
wait(NULL);
} else {
tv.tv_sec = 0;
tv.tv_usec = newrand;
select(0,NULL,NULL,NULL,&tv);
exit(0);
}
}
}
And spawn just one child. I got a whole wad of messages similar to:
schedule_timeout: wrong timeout value fffffffb from c012b4d0
schedule_timeout: wrong timeout value fffffff9 from c012b4d0
schedule_timeout: wrong timeout value ffffffff from c012b4d0
schedule_timeout: wrong timeout value fffffffc from c012b4d0
schedule_timeout: wrong timeout value fffffffd from c012b4d0
schedule_timeout: wrong timeout value fffffffb from c012b4d0
schedule_timeout: wrong timeout value fffffff8 from c012b4d0
schedule_timeout: wrong timeout value fffffff9 from c012b4d0
schedule_timeout: wrong timeout value fffffff9 from c012b4d0
schedule_timeout: wrong timeout value fffffff8 from c012b4d0
schedule_timeout: wrong timeout value fffffffd from c012b4d0
schedule_timeout: wrong timeout value fffffffd from c012b4d0
schedule_timeout: wrong timeout value fffffffd from c012b4d0
...those timeout values are suspicious.
Simon-
| Simon Kirby | Systems Administration |
| mailto:sim@netnation.com | NetNation Communications |
| http://www.netnation.com/ | Tech: (604) 684-6892 |
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/