Re: kernel BUG at kernel/workqueue.c:291
From: Carsten Aulbert
Date: Tue Mar 03 2009 - 02:36:50 EST
Hi Andrew,
Andrew Morton schrieb:
>> in the mean time 43 of our nodes were struck with this error. It seems
>> that the jobs of a certain user can trigger this bug, however I have no
>> clue how to really trigger it manually.
>
> That's a lot of nodes.
Quite, at least some percentage of the whole system.
>
> Let's cc the NFS developers, see if this rpciod crash is familiar to them?
Good idea, I should have done that myself - sorry
I think we were able to pinpoint at least one user's jobs to "generate"
this, but I need to talk to him, what access patterns are used via NFS here.
Systems are running Debian Etch,
dpkg -l | awk '/(nfs|portmap)/ {print $2 "\t\t" $3}'
libnfsidmap2 0.18-0
mountnfs 1.1.3-2
nfs-common 1.0.10-6+etch.1
nfs-kernel-server 1.0.10-6+etch.1
portmap 5-26
If you need more, please let me know! So far the machines are 'on hold',
i.e. we have not yet rebooted them to be able to find out a little bit
more. If you(anyone) think we can reboot them and put back into our
scheduling queue, please let me know, the users are waiting for more cycles.
Thanks a lot
Carsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/